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Prior Application; 



Art Unit: 1633 
Examiner: S. Kaushal 



U 
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SIR: This is a request for filing a 

■ Continuation Application under 37 C.F.R. § 1.53(b) of pending prior application Serial 
No. 09/244,130 filed February 4, 1999 . which is a Division of application Serial No. 
09/119,024 , filed July 20, 1998 . which is a Continuation of application Serial No. 
08/336.241 . filed November 7, 1994 . which is a CI P of application Serial No. 
07/971.160 , filed Novembers. 1992 . which is a CI P of application Serial No. 

07/879,689 , filed May 5, 1992 of Bernard DUJON et al. for NUCLEOTIDE 

SEQUENCE ENCODING THE ENZYME \-Scel AND THE USES THEREOF 



1. 



2. □ 

3. ■ 

4. ■ 

5. ■ 



Enclosed is a complete copy of the prior application including the oath or 
Declaration and drawings, if any, as originally filed. I hereby verify that the 
attached papers are a true copy of prior application Serial No. 08/336,241 
as originally filed on November 7, 1994 . 

Enclosed is a substitute specification under 37 C.F.R. § 1.125. 

Cancel Claims 2-22 



A Preliminary Amendment is enclosed. 

The filing fee is calculated on the basis of the claims existing in the prior 
application as amended at 3 and 4 above. 



Basic Application Filing Fee 


$690 


$ 690.00 




Number of 
Claims 




Basic 


Extra 
Claims 






Total Claims 


42 




20 


22 


x$18 


396 00 

www. V/ V 


Independent Claims 


3 




3 


0 


x$78 




[ X ] Presentation of Multiple Dep. Claim(s) 


+$260 


260.00 


Subtotal 


$ 1346.00 


Reduction by 1/2 if small entity 




TOTAL APPLICATION FILING FEE 


$ 1346.00 



6. ■ A check in the amount of $ 1.346.00 to cover the filing fee is enclosed. 



7. ■ The Commissioner is hereby authorized to charge any fees which may be 

required including fees due under 37 C.F.R. § 1.16 and any other fees due 
under 37 C.F.R. § 1 .17, or credit any overpayment during the pendency of 
this application to Deposit Account No. 06-0916. 

8. ■ Amend the specification by inserting before the first line, the sentence: 

-This is a Continuation of application Serial No. 09/244,130 filed February 4, 
1999, which is a Division of application Serial No. 09/119,024, filed July 20, 
1998, now U.S. Patent No. 5,948,678, which is a Continuation of application 
Serial No. 08/336,241, filed November 7, 1994, now U.S. Patent No. 
5,792,632, which is a CIP of application Serial No. 07/971,160, filed 
November 5, 1992, now U.S. Patent No. 5,474,896, which is a CIP of 
application Serial No. 07/879,689, filed May 5, 1992, now abandoned, all of 
which are incorporated herein by reference.- 

9. □ New formal drawings are enclosed. 

1 0. ■ The prior application is assigned of record to: Institut Pasteur at Reel 7450. 

Frame 0500. and Universite Pierre et Marie Curie at Reel 7450. Frame 0446 . 

1 1 . □ Priority of application Serial No. , filed on in 

(country) is claimed under 35 U.S.C. § 1 19. A certified copy 



□ is enclosed or □ is on file in the prior application. 



A verified statement claiming small entity status 



□ is enclosed or □ is on file in the prior application. 

The power of attorney in the prior application is to at least one of the 
following: FINNEGAN, HENDERSON, FARABOW, GARRETT & DUNNER, 
LLP., Douglas B. Henderson, Reg. No. 20,291; Ford F. Farabow, Jr., Reg. 
No. 20,630; Arthur S. Garrett, Reg. No. 20,338; Donald R. Dunner, Reg. 
No. 19,073; Brian G. Brunsvold, Reg. No. 22,593; Tipton D. Jennings, IV, 
Reg. No. 20,645; Jerry D. Voight, Reg. No. 23,020; Laurence R. Hefter, Reg. 
No. 20,827; Kenneth E. Payne, Reg. No. 23,098; Herbert H. Mintz, Reg. 
No. 26,691; C. Larry O'Rourke, Reg. No. 26,014; Albert J. Santorelli, Reg. 
No. 22,610; Michael C. Elmer, Reg. No. 25,857; Richard H. Smith, Reg. 
No. 20,609; Stephen L. Peterson, Reg. No. 26,325; John M. Romary, Reg. 
No. 26,331; Bruce C. Zotter, Reg. No. 27,680; Dennis P. O'Reilley, Reg. 
No. 27,932; Allen M. Sokal, Reg. No. 26,695; Robert D. Bajefsky, Reg. 
No. 25,387; Richard L. Stroup, Reg. No. 28,478; David W. Hill, Reg. 
No. 28,220; Thomas L. Irving, Reg. No. 28,619; Charles E. Lipsey, Reg. 
No. 28,165; Thomas W. Winland, Reg. No. 27,605; Basil J. Lewris, Reg. 
No. 28,818; Martin I. Fuchs, Reg. No. 28,508; E. Robert Yoches, Reg. 
No. 30,120; Barry W. Graham, Reg. No. 29,924; Susan Haberman Griffen, 
Reg. No. 30,907; Richard B. Racine, Reg. No. 30,415; Thomas H. Jenkins, 
Reg. No. 30,857; Robert E. Converse, Jr., Reg. No. 27,432; Clair X. Mullen, 
Jr., Reg. No. 20,348; Christopher P. Foley, Reg. No. 31,354; John C. Paul, 
Reg. No. 30,413; David M. Kelly, Reg. No. 30,953; Kenneth J. Meyers, Reg. 
No. 25,146; Carol P. Einaudi, Reg. No. 32,220; Walter Y. Boyd, Jr., Reg. 
No. 31,738; Steven M. Anzalone, Reg. No. 32,095; Jean B. Fordis, Reg. 
No. 32,984; Roger D. Taylor, Reg. 28,992; Barbara C. McCurdy, Reg. 
No. 32,120; James K. Hammond, Reg. No. 31,964; Richard V. Burgujian, 
Reg. No. 31,744; J. Michael Jakes, Reg. No. 32,824; Thomas W. Banks, 
Reg. No. 32,719; Christopher P. Isaac, Reg. No. 32,616; Bryan C. Diner, 
Reg. No. 32,409; M. Paul Barker, Reg. No. 32,013; Andrew Chanho Sonu, 
Reg. No. 33,457; David S. Forman, Reg. No. 33,694; Vincent P. Kovalick, 
Reg. No. 32,867; James W. Edmondson, Reg. No. 33,871; Michael R. 
McGurk, Reg. No. 32,045; Joann M. Neth, Reg. No. 36,363; Gerson S. 
Panitch, Reg. No. 33,751; Cheri M. Taylor, Reg. No. 33,216; Charles E. Van 
Horn, Reg. No. 40,266; Linda A. Wadler, Reg. No. 33,218; Jeffrey A. 
Berkowitz, Reg. No. 36,743; Michael R. Kelly, Reg. No. 33, 921; James B. 
Monroe, Reg. No. 33,971; Doris Johnson Hines, Reg. No. 34,629; Allen R. 
Jensen, Reg. No. 28,224; Lori Ann Johnson, Reg. No. 34,498; and David A. 
Manspeizer, Reg. No. 37,540. 

The power appears in the original declaration of the prior application. 

Since the power does not appear in the original declaration, a copy of the 
power in the prior application is enclosed. 
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16. ■ Please address all correspondence to FINNEGAN, HENDERSON, 

FARABOW, GARRETT and DUNNER, L.L.P., 1300 I Street, N.W., 
Washington, D.C. 20005-3315. 

1 7. □ Recognize as associate attorney 

(name, address & Reg. No.) 

18. □ Also enclosed is 



PETITION FOR EXTENSION . If any extension of time is necessary for the filing of this 
application, including any extension in the parent application, Serial No. 09/244,130, 
filed February 4. 1999 . for the purpose of maintaining copendency between the parent 
application and this application, and such extension has not otherwise been requested, 
such an extension is hereby requested, and the Commissioner is authorized to charge 
necessary fees for such an extension to our Deposit Account No. 06-0916. A duplicate 
copy of this paper is enclosed for use in charging the deposit account. 

FINNEGAN, HENDERSON, FARABOW, 
GARRETT & DUNNER, L.L.P. 



/o Kenneth J. Meyers 
Date: January 27. 2000 ^ v Reg. No.: 25,146 
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PATENT 

Attorney Docket No. 3495.01 11-11 
IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re Application of: 



Bernard DUJON et al. 



Serial Number: Unassigned 
(Continuation of Serial No.: 09/244,130) 

Filed: January 27, 2000 



Group Art Unit: Unassigned 
Examiner: Unassigned 



For: NUCLEOTIDE SEQUENCE ENCODING 

THE ENZYME l-SCEl AND THE USES THEREOF 

Assistant Commissioner for Patents 
Washington, D.C. 20231 



Sir: 



PRELIMINARY AMENDMENT 

Prior to the examination of the above application, please amend this application as 



follows: 



IN THE SPECIFICATION: 



On page 4, line 12, after "Dujon et al., 1985;" insert --(ref. 7 and ref. A4);-; and 

line 13, after "Colleaux et al., 1986;" insert -(ref. 8)-. 
Page 5, second line from bottom, after "TAA", insert -(SEQ ID N0:1)~; and 

last line, after "*" insert -(SEQ ID N0:2)~. 
Page 7, line 5, after "gene." insert -(SEQ ID NOS: 3 and 4)-; 

line 8, after "enzyme." insert -(SEQ ID NOS: 5 and 2)-; 

line 11, after "recognition." insert -(SEQ ID NOS: 6, 7, and 8)-; 

line 15, after "box." insert -(SEQ ID NOS: 9 through 16)-; 

line 17, after "l-Sce I." insert -(SEQ ID N0:2)-; 



Continuation of Serial No.: 09/244,130 
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line 19, after "endonucleases." insert -(SEQ ID NOS:17 through 44)-; and 
line 28, after "pAFIOO." insert --(SEQ ID NOS:45 through 50)-. 
Page 10, line 21, delete "HR" and insert therefor -homologous recombination 



(HR)-. 



Page 13, line 18, delete "SSA" and insert therefor -single-strand annealing (SSA)-. 
Page 14, line 22, delete "a)" and insert therefor ~a1)-. 
Page 18, substitute the following for the last two lines: 

-5' TAGGGATAA I CAGGGTAAT 3' (SEQ ID NO:51) 
3' ATCCC T TATTGTCCCATTA 5' (SEQ ID NO:52)-. 
Page 48, line 2, delete " 'polybrain' " and insert therefor -polybrene (hexadimethrine 
bromide)-. 

Page 72, line 3, delete "[11]" and insert therefor ~[11B]~. 

Page 72, line 23, delete "[13]" and insert therefor ~[13B]~. 

Page 72, line 27, delete "[12]" and insert therefor -[12B]-. 

Page 80, line 14, delete "[29]" and insert therefor ~[29B]~. 

Page 91, line 24, delete "Varmus, H. and Brown, P. 1989. Retroviruses" and insert 
therefor -Varmus, H. and Brown, P., Retroviruses in Mobile DNA, 58-1 08 (Douglas E. Berg 
and Martha H. Howe eds., 1989).-. 

After making the above amendments, please move pages 50-58 and insert these 
pages after page 88 and before page 89 of the application. Then please renumber the 
pages as follows: 
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Continuation of Serial No.: 09/244,130 



Renumber pages 50-58 as pages 80-88, respectively; and renumber pages 59-88 
as pages 50-79, respectively. 

After page 93, and before page 94, please insert the attached pages titled 
"SEQUENCE LISTING". 
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On pages 89-93, renumber references "1" through "37" to read --1B- through 



-37B". 



IN THE CLAIMS: 



Please cancel claim 1 . 

Please add the following new claims: 

-23. A recombinant mammalian or plant chromosome comprising an 
endonuclease site selected from the group consisting of HO endonuclease and Group I 
intron encoded endonuclease sites. 

24. The recombinant chromosome of claim 23, wherein said endonuclease site 
is a Group I intron encoded endonuclease site. 

25. The recombinant chromosome of claim 24, wherein said endonuclease site 
is selected from the group consisting of Class I l-endonuclease sites, Class II I- 
endonuclease sites, Class III l-endonuclease sites, Class IV l-endonuclease sites, and 
Class V l-endonuclease sites. 

26. The recombinant chromosome of claim 25, wherein said endonuclease site 
is a Class I l-endonuclease site. 
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Continuation of Serial No.: 09/244,130 

27. The recombinant chromosome of claim 26, wherein said endonuclease site 
is selected from the group consisting of l-Scel, l-ScelV, l-Csml, and l-Panl sites. 

28. The recombinant chromosome of claim 27, wherein said endonuclease site 
is an l-Scel site. 

29. The recombinant chromosome of any of claims 23-28, wherein said 
chromosome is a mouse chromosome. 

30. A recombinant mammalian or plant cell comprising a recombinant 
chromosome comprising an endonuclease site selected from the group consisting of HO 
endonuclease and Group I intron encoded endonuclease sites. 

31 . The recombinant cell of claim 30, wherein said endonuclease site is a Group 
I intron encoded endonuclease site. 

32. The recombinant cell of claim 31 , wherein said endonuclease site is selected 
from the group consisting of Class I l-endonuclease sites, Class II l-endonuclease sites, 
Class III l-endonuclease sites, Class IV l-endonuclease sites, and Class V l-endonuclease 
sites. 

33. The recombinant cell of claim 32, wherein said endonuclease site is a Class 
I l-endonuclease site. 

34. The recombinant cell of claim 33, wherein said endonuclease site is selected 
from the group consisting of l-Scel, l-ScelV, l-Csml, and l-Panl sites. 
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Continuation of Serial No.: 09/244,130 

35. The recombinant cell of claim 34, wherein said endonuclease site is an l-Scel 

site. 

36. The recombinant cell of any of claims 30-35, wherein said cell is a mouse 

cell. 

37. The recombinant cell of claim 36, wherein said cell is a mouse stem cell. 

38. A retroviral vector comprising an endonuclease site selected from the group 
consisting of HO endonuclease and Group I intron encoded endonuclease sites. 

39. The retroviral vector of claim 38, wherein said endonuclease site is a Group 
I intron encoded endonuclease site. 

40. The retroviral vector of claim 39, wherein said endonuclease site is selected 
from the group consisting of Class I l-endonuclease sites, Class II l-endonuclease sites, 
Class III l-endonuclease sites, Class IV l-endonuclease sites, and Class V l-endonuclease 
sites. 

41 . The retroviral vector of claim 40, wherein said endonuclease site is a Class 
I l-endonuclease site. 

42. The retroviral vector of claim 41 , wherein said endonuclease site is selected 
from the group consisting of l-Scel, l-ScelV, l-Csml, and l-Panl sites. 

43. The retroviral vector of claim 42, wherein said endonuclease site is an l-Scel 

site. 
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Continuation of Serial No.: 09/244,130 

44. The retroviral vector of any of claims 38-43, wherein said retroviral vector is 
a Moloney Murine Leukemia Virus vector.-- 

REMARKS 

This amendment is intended to correct typographical errors, incomplete citations, 
and ambiguous references to listed citations in the specification. Support for this 
amendment is found throughout the specification. New claims 23-44 find support in the 
specification and original claims, for example as follows: 

New Claim Supported in Specification at: 

23-44 Page 1, lines 9-16; 

page 3, line 15, through page 4, line 23; 
page 6, lines 12-23; 

page 26, line 5, through page 27, line 25; 
page 29, line 16, through page 30, line 20; 
page 35, line 16, through page 36, line 13; 
page 36, line 22, through page 38, line 16; 
page 38, lines 9-16; 

page 47, line 3, through page 48, line 10; 
page 49, lines 8-12; 
Fig. 6.; and 
Fig. 13. 
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24, 31, and 39 



Page 26, line 5, through page 27, line 25; and 
Fig. 6. 



25, 32, and 40 



26, 33, and 41 



27, 34, and 42 



28, 35, and 43 



29 and 36 



Page 27, lines 3-25; and 
Fig. 6. 

Page 27, lines 3-13; and 
Fig. 6. 

Page 27, lines 3-10; and 
Fig. 6. 

Page 27, lines 3-7; 
Fig. 6; and 
Fig. 13. 

Page 38, lines 9-16; and 
page 47, line 3, through page 48, line 10. 
Page 38, lines 9-16; and 
page 47, line 3, through page 48, line 10; and 
Fig. 13. 

Accordingly, no new matter is entered by way of this amendment and entry is 
respectfully requested. 

Applicants submit herewith a Sequence Listing and have amended the 
specification to conform with the requirements of 37 C.F.R. §§ 1 .821-1.825. Applicants 
request the use of the computer-readable copy submitted in U.S. application Serial No. 
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44 



Continuation of Serial No.: 09/244,130 

08/336,241 filed November 7, 1994, now U.S. Patent No. 5,792,632 issued August 11, 
1998. 

I hereby state that the contents of the enclosed Sequence Listing and the 
computer-readable copy of the Sequence Listing in U.S. application Serial No. 
08/336,241 filed November 7, 1994, now U.S. Patent No. 5,792,632 issued August 11, 
1998, submitted in accordance with 37 C.F.R. § 1.821(c) and (e), respectively, are the 



I further state that the submission, filed in accordance with 37 C.F.R. 
§ 1 .821(g) does not contain new matter. 

Please grant any extensions of time required to enter this response and 
charge any additional required fees to our deposit account 06-0916. 



Finnegan, Henderson, 
Farabow, Garrett, 
s dunner, l.l.p. 

1300 I STREET, N. W. 
WASHINGTON, D. C. 20O05 
202-40S-4O00 



same. 



Respectfully submitted 



FINNEGAN, HENDERSON, FARABOW, 
GARRETT & DUNNER L LP. 



Dated: January 27, 2000 
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UNITED STATES PATENT APPLICATION 



OF 



ANDRE CHOULIKA, ARNAUD PERRIN, 
BERNARD DUJON and JEAN- FRANCOIS NICOLAS 



FOR 



NUCLEOTIDE SEQUENCE ENCODING THE ENZYME I-Scel AND 

THE USES THEREOF 



- 2 - 



for studying the action of genes, and for studying the intri- 
cate interaction of cells in the immune system. The whole 
animal is the ultimate assay system for manipulated genes, 
which direct complex biological processes* 

Transgenic animals can provide a general assay for 
functionally dissecting DNA sequences responsible for tissue 
specific or developmental regulation of a variety of genes. 
In addition, transgenic animals provide useful vehicles for 
expressing recombinant proteins and for generating precise 
animal models of human genetic disorders. 

For a general discussion of gene cloning and expression 
in animals and animal cells, see Old and Primrose, "Prin- 
ciples of Gene Manipulation, " Blackwell Scientific Publica- 
tions, London (1989) , page 255 et seg. 

Transgenic lines, which have a predisposition to spe- 
cific diseases and genetic disorders, are of great value in 
the investigation of the events leading to these states. It 
is well known that the efficacy of treatment of a genetic 
disorder may be dependent on identification of the gene de- 
fect that is the primary cause of the disorder. The discov- 
ery of effective treatments can be expedited by providing an 
animal model that will lead to the disease or disorder, which 
will enable the study of the efficacy, safety, and mode of 
action of treatment protocols, such as genetic recombination. 

One of the key issues in understanding genetic recombi- 
nation is the nature of the initiation step. Studies of ho- 
mologous recombination in bacteria and fungi have led to the 
proposal of two types of initiation mechanisms. In the first 
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model, a single -strand nick initiates strand assimilation and 
branch migration (Meselson and Radding 1975) . Alternatively, 
a double -strand break may occur, followed by a repair mecha- 
nism that uses an uncleaved homologous sequence as a template 
(Resnick and Martin 1976) . This latter model has gained sup- 
port from the fact that integrative transformation in yeast 
is dramatically increased when the transforming plasmid is 
linearized in the region of chromosomal homology (Orr-Weaver, 
Szostak and Rothstein 1981) and from the direct observation 
of a double -strand break during mating type intercon vers ion 
of yeast (Strathern et al . 1982). Recently, double-strand 
breaks have also been characterized during normal yeast mei- 
otic recombination (Sun et al. 1989; Alani, Padmore and 
Kleckner 1990) . 

Several double-strand endonuclease activities have been 
characterized in yeast: HO and intron encoded endonucleases 
are associated with homologous recombination functions, while 
others still have unknown genetic functions (Endo-Scel, Endo- 
Scell) (Shibata et al. 1984; Morishima et al. 1990). The HO 
site-specific endonuclease initiates mating-type 
interconversion by making a double-strand break near the YZ 
junction of MAT (Kostriken et al. 1983) . The break is subse- 
quently repaired using the intact HML or HMR sequences and 
resulting in ectopic gene conversion. The HO recognition 
site is a degenerate 24 bp non- symmetrical sequence 
(Nickoloff, Chen, and Heffron 1986; Nickoloff, Singer and 
Heffron 1990) . This sequence has been used as a 
"recombinator" in artificial constructs to promote intra- and 
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intermolecular mitotic and meiotic recombination (Nickoloff, 
Chen and Heffron, 1986; Kolodkin, Klar and Stahl 1986; Ray et 
al. 1988, Rudin and Haber, 1988; Rudin, Sugarman, and Haber 
1989) . 

The two-site specific endonucleases, I-Scel (Jacquier 
and Dujon 1985) and I-SceII (Delahodde et al . 1989; Wenzlau 
et al. 1989), that are responsible for intron mobility in 
mitochondria, initiate a gene conversion that resembles the 
HO-induced conversion (see Dujon 1989 for review). I-Scel, 
which is encoded by the optional intron Sc LSU.l of the 21S 
rRNA gene, initiates a double-strand break at the intron in- 
sertion site (Macreadie et al . 1985; Dujon et al . 1985; 
Colleaux et al. 1986). The recognition site of I-Scel ex- 
tends over an 18 bp non- symmetrical sequence (Colleaux et al. 
1988) . Although the two proteins are not obviously related 
by their structure (HO is 586 amino acids long while I-Scel 
is 235 amino acids long) , they both generate 4 bp staggered 
cuts with 3 7 OH overhangs within their respective recognition 
sites. It has been found that a mitochondrial intron-encoded 
endonuclease, transcribed in the nucleus and translated in 
the cytoplasm, generates a double -strand break at a nuclear 
site: The repair events induced by I-Scel are identical to 
those initiated by HO. 

In summary, there exists a need in the art for reagents 
and methods for providing transgenic animal models of human 
diseases and genetic disorders. The reagents can be based on 
the restriction enzyme I-Scel and the gene encoding this en- 
zyme. In particular, there exists a need for reagents and 
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methods for replacing a natural gene with another gene that 
is capable of alleviating the disease or genetic disorder. 

SUMMARY OF THE INVENTION 
Accordingly, this invention aids in fulfilling these 
needs in the art. Specifically, this invention relates to an 
isolated DNA encoding the enzyme I-Scel. The DNA has the 
following nucleotide sequence: 



ATG CAT ATG AAA AAC ATC AAA AAA AAC CAG GTA ATG 26 ~: 
HBKXNIRKNQVM 12 



2671 AAC CTC GOT CCG AAC TCT AAA CTG CTG AAA GAA TAC AAA ICC CAG CTG ATC GAA CTG AAC 2'2Z 
I3NLGPNSKLLKKTKSQI* I ELM 32 

2731 ATC GAA CAG TTC GAA GCA GGT ATC GOT CTG ATC CTG GGT GAT GCT TAC ATC CGT TCT CGT 27 9: 
33 X EQFEAG IGLILGDAY I R S R 52 

279L GAT GAA GGT AAA ACC TAC TGT ATG CAG TTC GAG TGG AAA AAC AAA GCA TAC ATG GAC CAC 235 : 
53 0 E G K T Y C MQF EWKN KAY MO H 72 

2851 GTA TGT CTG CTG TAC GAT CAG TGG GTA CTG TCC CCG CCG CAC AAA AAA GAA CGT GTT AAC 2913 
73 7CL LY0QWVLSPPHKKEIIVS 92 

2911 CAC CTG GGT AAC CTG GTA ATC ACC TGG GGC GCC CAG ACT TTC AAA CAC CAA GCT TTC AAC 297: 
93HLGNLVITWGAOTFKHQAFH 112 

2971 AAA CTG GCT AAC CTG TTC ATC GTT AAC AAC AAA AAA ACC ATC CCG AAC AAC CTG GTT GAA 3C33 
U3KLAMLFIVHNKKTIPHHLVE 132 

3031 AAC TAC CTG ACC CCG ATG TCT CTG GCA TAC TGG TTC ATG GAT GAT GGT GGT AAA TGG GAT 3090 
133MTLTPMSI.ATWFMDOGGKWD 152 

3091 TAC AAC AAA AAC TCT ACC AAC AAA TCG ATC GTA CTG AAC ACC CAG TCT TTC ACT TTC GAA 315 3 
153 YNKMSTNKSIVLKTQSFTFE 172 

3151 GAA GTA GAA TAC CTG GTT AAG GGT CTG CGT AAC AAA TTC CAA CTG AAC TGT TAC GTA AAA 3213 
173EVEYI.VKGt.RHKFQLN CYVK 192 

3211 ATC AAC AAA AAC AAA CCG ATC ATC TAC ATC GAT TCT ATG TCT TAC CTG ATC TTC TAC AAC 32" 3 
193INKNXPIIYIDSKSYLIFYH212 

3271 CTG ATC AAA CCG TAC CTG ATC CCG CAG ATG ATG TAC AAA CTG CCG AAC ACT ATC TCC TCC 3 3 30 
213LIK?yL IPQHMYKLPHTI3S232 



3331 GAA ACT TTC CTG AAA TAA 
233 E T F L K * 
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This invention also relates to a DNA sequence comprising 
a promoter operatively linked to the DNA sequence of the in- 
vention encoding the enzyme I-Scel. 

This invention further relates to an isolated RNA 
complementary to the DNA sequence of the invention encoding 
the enzyme I-Scel and to the other DNA sequences described 
herein. 

In another embodiment of the invention, a vector is 
provided. The vector comprises a plasmid, bacteriophage, or 
cosmid vector containing the DNA sequence of the invention 
encoding the enzyme I-Scel. 

In addition, this invention relates to E. coli or 
eukaryotic cells transformed with a vector of the invention. 

Also, this invention relates to transgenic animals con- 
taining the DNA sequence encoding the enzyme I-Scel and cell 
lines cultured from cells of the transgenic animals. 

In addition, this invention relates to a transgenic 
organism in which at least one restriction site for the 
enzyme I-£ceI has been inserted in a chromosome of the 
organism. 

Further, this invention relates to a method of 
genetically mapping a eukaryotic genome using the enzyme I- 
Scel . 

This invention also relates to a method for in vivo site 
directed recombination in an organism using the enzyme I- 
Scel . 
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BRIEF DESCRIPTION OF THE DRAWINGS 

This invention will be more fully described with refer- 
ence to the drawings in which: 

Figure 1 depicts the universal code equivalent of the 
mitochondrial I-Scel gene. 

Figure 2 depicts the nucleotide sequence of the 
invention encoding the enzyme I-Scel and the amino acid 
sequence of the natural I-Scel enzyme. 

Figure 3 depicts the I-Scel recognition sequence and 
indicates possible base mutations in the recognition site and 
the effect of such mutations on stringency of recognition. 

Figure 4 is the nucleotide sequence and deduced amino 
acid sequence of a region of plasmid pSCM525. The nucleotide 
sequence of the invention encoding the enzyme I-Scel is en- 
closed in the box. 

Figure 5 depicts variations around the amino acid se- 
quence of the enzyme I-£?ceI. 

Figure 6 shows Group I intron encoding endonucleases and 
related endonucleases. 

Figure 7 depicts yeast expression vectors containing the 
synthetic gene for I-Scel. 

Figure 8 depicts the mammalian expression vector PRSV I- 

Scel. 

Figure 9 is a restriction map of the plasmid pAFlOO. 
(See also YEAST, 6:521-534, 1990, which is relied upon and 
incorporated by reference herein) . 

Figures 10A and 10B show the nucleotide sequence and re- 
striction sites of regions of the plasmid pAFlOO. 
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Figure 11 depicts an insertion vector pTSMco, pTKMw, 
and pTTcco containing the I-Scel site for E. coli and other 
bacteria. 

Figure 12 depicts an insertion vector pTYW6 containing 
the I-Scel site for yeast. 

Figure 13 depicts an insertion vector PMLV LTR SAPLZ 
containing the I-Scel site for mammalian cells. 

Figure 14 depicts a set of seven transgenic yeast 

strains cleaved by I-Scel. Chromosomes from FY1679 (control) 

and from seven transgenic yeast strains with I-Scel sites 

inserted at various positions along chromosome XI were 

treated with I-Scel. DNA was electrophoresed on 1% agarose 

(SeaKem) gel in 0.25 X TBE buffer at 130 V and 12°C on a 

Rotaphor apparatus (Biometra) for 70 hrs using 100 sec to 

40 sec decreasing pulse times. (A) DNA was stained with 

ethidium bromide (0.2/zg/ml) and transferred to a Hybond N 

32 

(Amersham) membrane for hybridization. (B) P labelled 
cosmid pUKG040 which hybridizes with the shortest fragment of 
the set was used as a probe. Positions of chromosome XI and 
shorter chromosomes are indicated. 

Figure 15 depicts the rationale of the nested 
chromosomal fragmentation strategy for genetic mapping. (A) 
Positions of I-Scel sites are placed on the map, irrespective 
of the left/right orientation (shorter fragments are 
arbitrarily placed on the left) . Fragment sizes as measured 
from PFGE (Fig. 14 A) are indicated in kb (note that the sum 
of the two fragment sizes varies slightly due to the limit of 
precision of each measurement) . (B) Hybridization with the 
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probe that hybridizes the shortest fragment of the set 

determines the orientation of each fragment (see Fig. 14B) . 

Fragments that hybridize with the probe (full lines) have 

been placed arbitrarily to the left. (C) Transgenic yeast 

strains have been ordered with increasing sizes of 

hybridizing chromosome fragments. (D) Deduced I-Scel map 

with minimal and maximal size of intervals indicated in kb 

(variations in some intervals are due to limitations of PFGE 

measurements) . (E) Chromosome subfragments are used as 

probes to assign each cosmid clone to a given map interval or 

across a given J-Scel site. 

Figure 16 depicts mapping of the I-Scel sites of 

transgenic yeast strains by hybridization with left end and 

right end probes of chromosome XI. Chromosomes from FY1679 

(control) and the seven transgenic yeast strains were 

digested with I-Scel. Transgenic strains were placed in 

order as explained in Fig. 15. Electrophoresis conditions 

32 

were as m Fig. 14. P labelled cosmids pUKG040 and pUKG066 
were used as left end and right end probes, respectively. 

Figure 17 depicts mapping of a cosmid collection using 
the nested chromosomal fragments as probes . Cosmid DNAs were 
digested with EcoRI and electrophoresed on 0.9% agarose 
(SeaKem) gel at 1.5 V/cm for 14 hrs, stained with ethidium 
bromide and transferred to a Hybond N membrane. Cosmids were 
placed in order from previous hybridizations to help 
visualize the strategy. Hybridizations were carried out 
serially on three identical membranes using left end nested 
chromosome fragments purified on PFGE (see Fig. 16) as 
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probes. A: ethidium bromide staining (ladder is the BRL "Ikb 
ladder"), B: membrane #1, probe: Left tel to A302 site, 
C: membrane #1, probe: Left tel to M57 site, D: membrane #2, 
probe: Left tel to H81 site, E: membrane #2, probe: Left tel 
to T62 site, F: membrane #3, probe: Left tel to G41 site, G: 
membrane #3, probe: Left tel to D3 04 site, H: membrane #3, 
probe: entire chromosome XI. 

Figure 18 depicts a map of the yeast chromosome XI as 
determined from the nested chromosomal fragmentation 
strategy. The chromosome is divided into eight intervals 
(with sizes indicated in kb, see Fig. 15D) separated by seven 
I-Scel sites (E40, A302 ...). Cosmid clones falling either 
within intervals or across a given J-Scel site are listed 
below intervals or below interval boundaries, respectively. 
Cosmid clones that hybridize with selected genes used as 
probes are indicated by letters (a-i) . They localize the 
gene with respect to the J-Scel map and allow comparison with 
the genetic map (top) , 

Figure 19 depicts diagrams of successful site directed 
homologous recombination experiments performed in yeast. 

Figure 20. Experimental design for the detection of HR 
induced by I-Sce I. a) Maps of the 7.5 kb tk -PhleoLacZ 
retrovirus (G-MtkPL) and of the 6.0 kb PhleoLacZ retrovirus 
(G-MPL) , SA is splice acceptor site. G-MtkPL sequences (from 
G-MtkPL virus) contains PhleoLacZ fusion gene for positive 
selection of infected cells (in phleomycin- containing medium) 
and tk gene for negative selection (in gancyclovir-containing 
medium) . G-MPL sequences (from G-MPL virus) contains only 
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PhleoLacZ sequences, b) Maps of proviral structures following 
retroviral integration of G-MtkPL and G-MPL. I -See I 
PhleoLacZ LTR duplicates, placing I-Sce I PhleoLacZ sequences 
in the 5 # LTR. The virus vector (which functions as a 
promoter trap) is transcribed (arrow) by a flanking cellular 
promoter, P. c) I -See I creates two double strand breaks 
(DSBs) in host DNA liberating the central segment and leaving 
broken chromosome ends that can pair with the donor plasmid, 
pVRneo (d) . e) Expected recombinant locus following HR. 

Figure 21. A. Scheme of pG-MPL. SD and SA are splice donor 

and splice acceptor sites. The structure of the unspliced 

5.8 kb (genomic) and spliced 4;2 kb transcripts is shown 

32 

below. Heavy bar is P radiolabelled LacZ probe (P) . B. RNA 
Northern blot analysis of a pG MLP transformed ^-2 producer 
clone using polyadenylated RNA. Note that the genomic and 
the spliced mRNA are produced at the same high level. 

Figure 22. A. Introduction of duplicated I-Sce I recognition 

sites into the genome of mammalian cells by retrovirus 

integration. Scheme of G-MPL and G-MtkPL proviruses which 

illustrates positions of the two LTRs and pertinent 

restriction sites. The size of Bel I fragments and of I- 

32 

See I fragments are indicated. Heavy bar is P 
radiolabelled LacZ probe (P) . B. Southern blot analysis of 
cellular DNA from NIH3T3 fibroblasts cells infected by G- 
MtkPL and PCC7-S multipotent cells infected by G-MPL. Bel I 
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digests demonstrating LTR mediated PhleoLacZ duplication; I- 
Sce I digests demonstrating faithful duplication of I-Sce I 
sites. 

Figure 23. Verification of recombination by Southern, 

A. : Expected fragment sizes in kilobase pairs (kb) of 

provirus at the recombinant locus. 1) the parental proviral 

32 

locus. Heavy bar (P) is P radioactively labelled probe 
used for hybridization. 2) a recombinant derived after 
cleavage at the two I -See I sites followed by gap repair 
using pVR neo (double-site homologous recombination, DsHR) . 
3) a recombination event initiated by the cleavage at the I- 
Sce I sites in the left LTR (single -site homologous 
recombination, SsHR) . B.: Southern analysis of DNA from 
NIH3T3 / G-MtkPL clones 1 and 2, PCC7-S/G-MPL clones 3 and 4 
and transformants derived from cotransf ection with pCMV(I- 
Sce 1+) and pVRneo (la, lb, 2a, 3a, 3b and 4a) . Kpn I 
digestion of the parental DNA generates a 4.2 kb fragment 
containing LacZ fragment. Recombinants la and 3a are 
examples of DsHR Recombinants lb, 2a, 3b and 4a are examples 
of SsHR. 

Figure 24. Verification of recombination by Northern 

blot ' analyses. A.: Expected structure and sizes (in kb) of 

RNA from PCC7-S/G-MPL clone 3 cells before (top) and after 

(bottom) I-Sce I induced HR with pVRneo.l Heavy bars PI and 
32 

P2 are P radioactively labelled probes. B. : Northern blot 
analysis of the PCC7-S/G-MPL clone 3 recombinant (total RNA) . 
Lane 3 is parental cells, lane 3a recombinant cells. Two 
first lanes were probed with LacZ PI, two last lanes are 
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probed with neo P2 . parental PCC7-S/G-MPL clone 3 cells 
express a 7,0 kb LacZ RNA as expected of trapping of a 
cellular promoter leading to expression of a cellular-viral 
fusion RNA. The recombinant clone does not express this Lacz 
RNA but expresses a neo RNA of 5.0 kb, corresponding to the 
size expected for an accurate replacement of PhleoLacZ by neo 
gene . 

Figure 25. Types of recombination events induced by 

I -See I DSBs, a) Schematic drawing of the structure of the 

recombination substrate. The G-MtkPL has provirus two LTRs, 

each containing an I-Sce I recognition site and a PhleoLacZ 

gene. The LTRs are separated by viral sequences containing 

the tk gene. The phenotype of G-MtkPL containing cells is 
R s 

Phleo , GIs , j3-Gal± b) Possible modes of intra-chromosomal 
recombination. 1) The I -See I endonuclease cuts the I-Sce I 
site in the 5'LTR. The 5 ' part of U3 of the 5'LTR can pair 
and recombine with it homologous sequence in the 3 'LTR (by 
SSA) . 2) The I- See I endonuclease cuts the I -See I site in 
the 3 ' LTR . The 3' part of U3 of the 3 ' LTR can pair and 
recombine with its homologous sequence in the 5'LTR (by SSA) . 
3} The I -See I endonuclease cuts I -See I sites in the two 
LTRs. The two free ends can relegate (by an end- joining 
mechanism) . The resulting recombination product in each of 
the three models is a solitary LTR (see right side) . No 
modification would occur in the cellular sequences flanking 
the integration site, c) The I -See I endonuclease cuts the 
I -See I sites in the two LTRs. The two free ends can be 
repaired (by a gap repair mechanism) using the homologous 
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chromosome. On the right, the resulting recombination 
product is the deletion of the proviral integration locus. 

Figure 26. Southern blot analysis of DNA from NIH3T3/G- 
MtkPL 1 and 2, and PhleoLacZ' recombinants derived from 
transf ections with pCMV(I-Sce 1+) selected in Gancyclovir 
containing medium. a) Expected fragment sizes in kilobase 
pair (kbp) of parental provirus after digestion with Pst I 
endonuclease. Pst I digestion of the parental DNA NH3T3/G- 
MtkPL 1 generates two fragments of 10 kbp and of the 
parental NIH3 T3 / G-MtkPL 2 two fragments of 7 kbp and 9 kbp, 
b) Southern blot analysis of DNA digested by Pst I from 
NIH3T3/G-MtkPL 1, and recombinants derived from transf ection 
with pCMV(I-£>ce 1+) (1.1 to 1.5). c) Southern blot analysis 
of DNA digested by Pst I from NIH3T3 / G-MtkPL 2, and 
recombinants derived from transf ection with pCMV(I-Sce 1+) 
(2.1 to 2.6) . 

32 

Heavy bar is P radiolabelled LacZ probe (P) . 

Figure 27, Southern blot analysis of DNA from NIH3T3/G- 
MtkPL 1 and 2, and PhleoLacZ* recombinants derived from 
transfections with pCMV(I-Sce 1+) and pCMV(l-Sce I-) and 
selection in Phleomycin and Gancyclovir containing medium, 
a) Expected fragment sizes in kbp of parental provirus after 
digestion with Pst I or Bel I endonuclease. Pst I digestion 
of the parental DNA NIH3T3/G-MtkPL 1 generates two fragments 
of 10 kbp. Bel I digestion of the parental DNA NIH3T3/G- 
MtkPL 2 generates three fragments of 9.2 kbp, 7.2 kbp and 
6.0 kbp. a2) Expected fragment sizes in kbp of recombinants 
after digestion with Pst I or Bel I endonuclease. Pst I 
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digestion of DNA of the recombinant derived from 
NIH3T3/G-MtkPL 1 generates one fragment of 13.6 kbp. Bel I 
digestion of the DNA of the recombinants derived from 
NIH3T3/G-MtkPL 2 generates two fragments of 9.2 kbp and 6.0 
kbp. b) Southern blot analysis of DNA from NIH3T3 /G-MtkPL 1, 
and recombinants derived from trans feet ion with pCMV(I-Sce I- 
) and pCMV(I-Sce 1+) (lc, Id) . c) Southern analysis of DNA 
from NIH3T3 /G-MtkPL 2, and transf ormants derived from 
transfection with pCMV(I-Sce I-) (2a, 2b) and pCMV(I-Sce 1+) 
(2c to 2h) . 

32 

Heavy bar is P radiolabelled LacZ probe (P) . 

Figure 28. Figure 28 is a diagram illustrating the 
loss of heterozygosity by the insertion or presence of an I - 
See I site, expression of -the enzyme I- See I, cleavage at the 
site, and repair of the double strand break at the site with 
the corresponding chromatid. 

Figure 29. Figure 29 is a diagram illustrating 
conditional activation of a gene. An I-Sce I site is 
integrated between tandem repeats, and the enzyme I -See I is 
expressed. The enzyme cleaves the double stranded DNA at the 
I- See I site. The double strand break is repaired by single 
stand annealing, yielding an active gene. 

Figure 30. Figure 30 is a diagram illustrating one 
step rearrangement of a gene by integration of an I -See I 
site or by use of an I -See I site present in the gene. A 
plasmid having either one I -See I site within an inactive 
gene, or two I-See I sites at either end of an active gene 
without a promoter, is introduced into the cell. The cell 
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contains an inactive form of the corresponding gene. The 
enzyme I-Sce I cuts the plasmid at the I-Sce I sites, and 
recombination between the chromosome and the plasmid yields 
an active gene replacing the inactive gene. 

Figure 31. Figure 31 is a diagram illustrating the 
duplication of a locus. An I-Sce I site and a distal part of 
the locus are inserted into the gene by classical gene 
replacement. The I-Sce I site is cleaved by I-Sce I enzyme, 
and the break is repaired by homologous sequences. This 
results in duplication of the entire locus. 

Figure 32. Figure 30 is a diagram illustrating the 
deletion of a locus. Two I-Sce I sites are added to flank 
the locus to be deleted. The I-Sce I enzyme is expressed, 
and the sites are cleaved. The two remaining ends recombine, 
deleting the locus between the two I-Sce I sites. 

Figure 33. Figure 33 is a diagram of plasmid pG- 
MtkAPAPL showing the restriction sites. The plasmid is 
constructed by deletion of the polyadenylation region of the 
tk gene from the pGMtkPL plasmid. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
The genuine mitochondrial gene (ref . 8) cannot be ex- 
pressed in E. coli, yeast or other organisms due to the pecu- 
liarities of the mitochondrial genetic code. A "universal 
code equivalent" has been constructed by in vitro site- 
directed mutagenesis. Its sequence is given in Fig. 1. Note 
that all non-universal codons (except two CTN) have been re- 
placed together with some codons extremely rare in E. coli. 
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The universal code equivalent has been successfully 
expressed in E. coli and determines the synthesis of an ac- 
tive enzyme. However, expression levels remained low due to 
the large number of codons that are extremely rare in E. 
coli. Expression of the "universal code equivalent" has been 
detected in yeast. 

To optimize gene expression in heterologous systems, a 
synthetic gene has been designed to encode a protein with the 
genuine amino acid sequence of I -Seel using, for each codon, 
that most frequently used in E. coli. The sequence of the 
synthetic gene is given in Fig. 2. The synthetic gene was 
constructed in vitro from eight synthetic oligonucleotides 
with partial overlaps. Oligonucleotides were designed to 
allow mutual priming for second strand synthesis by Klenow 
polymerase when annealed by pairs. The elongated pairs were 
then ligated into plasmids. Appropriately placed restriction 
sites within the designed sequence allowed final assembly of 
the synthetic gene by in vitro ligation. The synthetic gene 
has been successfully expressed in both E. coli and yeast. 

1 . I -Seel Gene Sequence 

' This invention relates to an isolated DNA sequence 
encoding the enzyme I-Scel. The enzyme I-Scel is an 
endonuclease. The properties of the enzyme (ref . 14) are as 
follows : 
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I-Scel is a double -stranded endonuclease that cleaves 
DNA within its recognition site. I-Scel generates a 4bp 
staggered cut with 3 'OH overhangs. 

Substrate: Acts only on double -stranded DNA. Substrate 
DNA can be relaxed or negatively supercoiled. 

Cations: Enzymatic activity requires Mg ++ (8 mM is 
optimum) . Mn ++ can replace Mg ++ , but this reduces the 
stringency of recognition. 

Optimum conditions for activity: high pH (9 to 10) , 
temperature 20-40°C, no monovalent cations. 

Enzyme stability: I-Scel is unstable at room tempera- 
ture. The enzyme -substrate complex is more stable than 
the enzyme alone {presence of recognition sites stabi- 
lizes the enzyme.) 

The enzyme I-Scel has a known recognition site. (ref. 
14.) The recognition site of I-Scel is a non- symmetrical 
sequence that extends over 18 bp as determined by systematic 
mutational analysis. The sequence reads: (arrows indicate 
cuts) 



5' TAGGGATAACAGGGTAAT 3' 
3' ATCCCTATTGTCCCATTA 5' 
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The recognition site corresponds, in part, to the upstream 
exon and, in part, to the downstream exon of the intron plus 
form of the gene. 

The recognition site is partially degenerate: single 
base substitutions within the 18 bp long sequence result in 
either complete insensitivity or reduced sensitivity to the 
enzyme, depending upon position and nature of the substitu- 
tion. 

The stringency of recognition has been measured on: 
-1- mutants of the site, 

-2- the total yeast genome (Saccharomyces 

n 

cerevisiae, genome complexity is 1.4 x 10 bp). Data 
are unpublished. 
Results are: 

-1- Mutants of the site: As shown in Fig. 3, 
there is a general shifting of stringency, i.e., mutants 
severely affected in Mg ++ become partially affected in 
Mn ++ , mutants partially affected in Mg ++ become 
unaffected in Mn ++ . 

-2- Yeast: In magnesium conditions, no cleavage is 
observed in normal yeast. In the same condition, DNA 
from transgenic yeasts is cleaved to completion at the 
artificially inserted I-Scel site and no other cleavage 
site can be detected. If magnesium is replaced by 
manganese, five additional cleavage sites are revealed 
in the entire yeast genome, none of which is cleaved to 
completion. Therefore, in manganese the enzyme reveals 
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an average of 1 site for ca. 3 millions based pairs (5/ 
1.4 x 10 7 bp) . 

Definition of the recognition site : important 
bases are indicated in Fig. 3. They correspond to bases 
for which severely affected mutants exist. Notice 
however that : 

-1- All possible mutations at each position have 
not been determined; therefore a base that does not 
correspond to a severely affected mutant may still be 
important if another mutant was examined at this very 
same position. 

-2- There is no clear-cut limit between a very 
important base (all mutants are severely affected) and a 
moderately important base (some of the mutants are 
severely affected) . There is a continuum between ex- 
cellent substrates and poor substrates for the enzyme. 

The expected frequency of natural I-Scel sites in a 
random DNA sequence is, therefore, equal to (0.25) or 
(1.5 x 10 ^) . In other words, one should expect one 
natural site for the equivalent of ca. 20 human genomes, 
but the frequency of degenerate sites is more difficult 
to predict. 

I-Scel belongs to a "degenerate" subfamily of the 
two-dodecapeptide family. Conserved amino acids of the 
dodecapeptide motifs are required for activity. In 
particular, the aspartic residues at positions 9 of the 
two dodecapeptides cannot be replaced, even with 
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glutamic residues. It is likely that the dodecapeptides 
form the catalytic site or part of it. 

Consistent with the recognition site being non- 
symmetrical, it is likely that the endonucleolytic ac- 
tivity of I-Scel requires two successive recognition 
steps: binding of the enzyme to the downstream half of 
the site (corresponding to the downstream exon) followed 
by binding of the enzyme to the upstream half of the 
site (corresponding to the upstream exon) . The first 
binding is strong, the second is weaker, but the two are 
necessary for cleavage of DNA. In vitro, the enzyme can 
bind the downstream exon alone as well as the intron- 
exon junction sequence, but no cleavage results. 
The evolutionarily conserved dodecapeptide motifs of 
intron- encoded I-Scel are essential for endonuclease 
activity. It has been proposed that the role of these motifs 
is to properly position the acidic amino acids with respect 
to the DNA sequence recognition domains of the enzyme for the 
catalysis of phosphodiester bond hydrolysis (ref . P3) . 

The nucleotide sequence of the invention, which encodes 
the natural I-Scel enzyme is shown in Fig. 2. The nucleotide 
sequence of the gene of the invention was derived by 
dideoxynucleotide sequencing. - The base sequences of the 

nucleotides are written in the 5' >3' direction. Each of 

the letters shown is a conventional designation for the fol- 
lowing nucleotides : 

A Adenine 
G Guanine 
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T Thymine 

C Cytosine. 
It is preferred that the DNA sequence encoding the en- 
zyme I-Scel be in a purified form. For instance, the se- 
quence can be free of human blood-derived proteins, human se- 
rum proteins, viral proteins, nucleotide sequences encoding 
these proteins, human tissue, human tissue components, or 
combinations of these substances. In addition, it is pre- 
ferred that the DNA sequence of the invention is free of ex- 
traneous proteins and lipids, and adventitious microorgan- 
isms, such as bacteria and viruses. The essentially purified 
and isolated DNA sequence encoding I-Scel is especially use- 
ful for preparing expression vectors. 

Plasmid pSCM525 is a pUC12 derivative, containing an 
artificial sequence encoding the DNA sequence of the inven- 
tion. The nucleotide sequence and deduced amino acid 
sequence of a region of plasmid pSCM525 is shown in Fig. 4. 
The nucleotide sequence of the invention encoding I-Scel is 
enclosed in the box. The artificial gene is a BairiHI - Sail 
piece of DNA sequence of 723 base pairs, chemically 
synthesized and assembled. It is placed under tac promoter 
control. The DNA sequence of the artificial gene differs 
from the natural coding sequence or its universal code 
equivalent described in Cell (1986), Vol. 44, pages 521-533. 
However, the translation product of the artificial gene is 
identical in sequence to the genuine omega- endonuclease 
except for the addition of a Met-His at the N- terminus. It 
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will be understood that this modified endonuclease is within 
the scope of this invention. 

Plasmid pSCM525 can be used to transform any suitable E. 
coli strain and transformed cells become ampicillin- 
resistant. Synthesis of the omega -endonuclease is obtained 
by addition of I.P.T.G. or an equivalent inducer of the 
lactose operon system. 

A plasmid identified as pSCM525 containing the enzyme I- 
Scel was deposited in E. coli strain TGI with the Collection 
Nationale de Cultures de Microorganismes (C.N. CM.) of 
Institut Pasteur in Paris, France on November 22, 1990, under 
culture collection deposit Accession No. 1-1014. The nucle- 
otide sequence of the invention is thus available from this 
deposit . 

The gene of the invention can also be prepared by the 

formation of 3' >5' phosphate linkages between nucleoside 

units using conventional chemical synthesis techniques. For 
example, the well-known phosphodiester, phosphotriester, and 
phosphite triester techniques, as well as known modifications 
of these approaches, can be employed. Deoxyribonucleotides 
can be prepared with automatic synthesis machines, such as 
those based on the phosphoramidite approach. Oligo- and 
polyribonucleotides can also be obtained with the aid of RNA 
ligase using conventional techniques. 

This invention of course includes variants of the DNA 
sequence of the invention exhibiting substantially the same 
properties as the sequence of the invention. By this it is 
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meant that DNA sequences need not be identical to the se- 
quence disclosed herein. Variations can be attributable to 
single or multiple base substitutions, deletions, or inser- 
tions or local mutations involving one or more nucleotides 
not substantially detracting from the properties of the DNA 
sequence as encoding an enzyme having the cleavage properties 
of the enzyme I-Scel. 

Fig. 5 depicts some of the variations that can be made 
around the I-Scel amino acid sequence. It has been demon- 
strated that the following positions can be changed without 
affecting enzyme activity: 

positions -1 and -2 are not natural. The two amino 

acids are added due to cloning strategies, 
positions 1 to 10: can be deleted. 



position 36 
position 40 
position 41 
position 43 
position 46 



G is tolerated. 
M or V are tolerated. 
S or N are tolerated. 
A is tolerated. 
V or N are tolerated, 
position 91: A is tolerated, 
positions 123 and 156: L is tolerated, 
position 223: A and S are tolerated. 



It will be understood that enzymes containing these modifica- 
tions are within the scope of this invention. 

Changes to the amino acid sequence in Fig. 5 that have 
been demonstrated to affect enzyme activity are as follows: 



position 


19 


: L 


to S 




position 


38. 


: I 


to S or 


N 


position 


39: 


: S 


to D or R 


position 


40. 


: L 


to Q 




position 


42: 


: L 


to R 




position 


44 : 


: E 


to E, G 


or 


position 


45: 


: A 


to E or 


D 


position 


46: 


: X 


to D 




position 


47: 


: I 


to R or 


N 


position 


80: 


: h 


to S 




position 


144: D to E 
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position 145: D to E 
position 146: G to E 
position 147: G to S 

It will also be understood that the present invention is 
intended to encompass fragments of the DNA sequence of the 
invention in purified form, where the fragments are capable 
of encoding enzymatically active I-Scel. 

The DNA sequence of the invention coding for the enzyme I- 
Scel can be amplified in the well known polymerase chain 
reaction (PGR) , which is useful for amplifying all or spe- 
cific regions of the gene. See e.g. , S. Kwok et al., J. 
Virol., 61:1590-1694 (1987); U.S. Patent 4,683,202; and U.S. 
Patent 4,683,195. More particularly, DNA primer pairs of 
known sequence positioned 10-300 base pairs apart that are 
complementary to the plus and minus strands of the DNA to be 
amplified can be prepared by well known techniques for the 
synthesis of oligonucleotides. One end of each primer can be 
extended and modified to create restriction endonuclease 
sites when the primer is annealed to the DNA. The PCR reac- 
tion mixture can contain the DNA, the DNA primer pairs, four 
deoxyribonucleoside triphosphates, MgCl 2 , DNA polymerase, and 
conventional buffers. The DNA can be amplified for a number 
of cycles. It is generally possible to increase the sensi- 
tivity of detection by using a multiplicity of cycles, each 
cycle consisting of a short period of denaturation of the DNA 
at an elevated temperature, cooling of the reaction mixture, 
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and polymerization with the DNA polymerase. Amplified se- 
quences can be detected by the use of a technique termed oli- 
gomer restriction (OR). See , R. K. Saiki et al . , Bio/ 
Technology 3:1008-1012 (1985). 

The enzyme I-Scel is one of a number of endonucleases with 
similar properties. Following is a listing of related 
enzymes and their sources. 

Group I intron encoded endonucleases and related enzymes 
are listed below with references. Recognition sites are 
shown in Fig. 6. 



Enzyme 



Encoded by 



Ref 



I-Scel 
I-SceII 


Sc 
Sc 


I-SceIII 


Sc 


I-SceIV 


Sc 


I-Ceul 


Ce 


I-Crel 
I-Ppol 


Cr 
Pp 


I-TevI 


T4 



I-TevII 

I-revIII 

HO 

Endo See I 



T4 td-1 intron 

T4 sunY intron 
RB3 nrdB-1 intron 
HO yeast gene 
RF3 yeast mito. gene 



this work 

Sargueil et al., NAR 

(1990) 18, 5659-5665 
Sargueil et al . , MGG 

(1991) 225, 340-341 
Seraphin et al. (1992) 
in press 

Marshall, Lemieux Gene 
(1991) 104, 241-245 
Rochaix (unpublished) 
Muscarella et al., MCB 
(1990) 10, 3386-3396 
Chu et al., PNAS (1990) 
87, 3574-3578 and Bell- 
Pedersen et al. NAR 
(1990) 18, 3763-3770. 
Bell-Pedersen et al . NAR 

(1990) 18, 3763-3770. 
Eddy, Gold, Genes Dev. 

(1991) 5, 1032-1041 
Nickoloff et al., MCB 

(1990) 10, 1174-1179 
Kawasaki et al . , JBC 

(1991) 266, 5342-5347 



Putative new enzymes (genetic evidence but no activity 
as yet) are I-CsmI from cytochrome b intron 1 of Chlamydomo- 
nas smithii mitochondria (ref. 15), I -Paul from cytochrome b 
intron 3 of Podospora anserina mitochondria (Jill Salvo) , and 
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probably enzymes encoded by introns Nc ndl'l and Nc cob* ! 
from Neurospora crassa. 

The I-endonucleases can be classified as follows: 

Class I : Two dodecapeptide motifs, 4 bp staggered cut with 
3' OH overhangs, cut internal to recognition site 

Subclass " I-Scel 11 Other subclasses 

I-Scel I-SceII 
I-SceIV I-SceIII 

I-Csinl I-Ceul (only one dodecapeptide motif) 

I-PanI I-Crel (only one dodecapeptide motif) 

HO 

TFP1-408 (HO homolog) 
Endo Seel 

Class II ; GIY "( N 10 -n ) YIG motif * 2 bp staggered cut with 3' 
OH overhangs, cut external to recognition site: 
I-TevI 

Class III : no typical structural motifs, 4 bp staggered cut 
with 3' OH overhangs, cut internal to recognition site: 
I-Ppol 

Class IV : no typical structural motifs, 2 bp staggered cut 
with 3' OH overhangs, cut external to recognition site: 
I-TevII 

Class V : no typical structural motifs, 2 bp staggered cut 
with 5' OH overhangs: 
I-TevIII. 
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2. Nucleotide Probes Containing the I-Scel 
Gene of The Invention 

The DNA sequence of the invention coding for the enzyme I- 

Scel can also be used as a probe for the detection of a 

nucleotide sequence in a biological material, such as tissue 

or body fluids. The probe can be labeled with an atom or 

inorganic radical, most commonly using a radionuclide, but 

also perhaps with a heavy metal. Radioactive labels include 
32 3 14 

P, H, C, or the like. Any radioactive label can be em- 
ployed, which provides for an adequate signal and has suf- 
ficient half -life. Other labels include ligands that can 
serve as a specific binding member to a labeled antibody, 
fluorescers, chemiluminescers, enzymes, antibodies which can 
serve as a specific binding pair member for a labeled ligand, 
and the like. The choice of the label will be governed by 
the effect of the label on the rate of hybridization and 
binding of the probe to the DNA or RNA. It will be necessary 
that the label provide sufficient sensitivity to detect the 
amount of DNA or RNA available for hybridization. 

When the nucleotide sequence of the invention is used as a 
probe for hybridizing to a gene, the nucleotide sequence is 
preferably affixed to a water insoluble solid, porous sup- 
port, such as nitrocellulose paper. Hybridization can be 
carried out using labeled polynucleotides of the invention 
and conventional hybridization reagents. The particular hy- 
bridization technique is not essential to the invention. 

The amount of labeled probe present in the hybridization 
solution will vary widely, depending upon the nature of the 
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label, the amount of the labeled probe which can reasonably 
bind to the support, and the stringency of the hybridization. 
Generally, substantial excesses of the probe over 
stoichiometric will be employed to enhance the rate of bind- 
ing of the probe to the fixed DNA. 

Various degrees of stringency of hybridization can be 
employed. The more severe the conditions, the greater the 
complementarity that is required for hybridization between 
the probe and the polynucleotide for duplex formation. Se- 
verity can be controlled by temperature, probe concentration, 
probe length, ionic strength, time, and the like. Conve- 
niently, the stringency of hybridization is varied by chang- 
ing the polarity of the reactant solution. Temperatures to 
be employed can be empirically determined or determined from 
well known formulas developed for this purpose. 

3. Nucleotide Sequences Containing the 
Nucleotide Sequence Encoding I-Scel 

This invention also relates to the DNA sequence of the 
invention encoding the enzyme I-Scel, wherein the nucleotide 
sequence is linked to other nucleic acids. The nucleic acid 
can be obtained from any source, for example, from plasmids, 
from cloned DNA or RNA, or from natural DNA or RNA from any 
source, including prokaryotic and eukaryotic organisms. DNA 
or RNA can be extracted from a biological material, such as 
biological fluids or tissue, by a variety of techniques in- 
cluding those described by Maniatis et al . , Molecular Clon- 
ing: A Laboratory Manual . Cold Spring Harbor Laboratory, New 
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York (1982) . The nucleic acid will generally be obtained 
from a bacteria, yeast, virus, or a higher organism, such as 
a plant or animal . The nucleic acid can be a fraction of a 
more complex mixture, such as a portion of a gene contained 
in whole human DNA or a portion of a nucleic acid sequence of 
a particular microorganism. The nucleic acid can be a frac- 
tion of a larger molecule or the nucleic acid can constitute 
an entire gene or assembly of genes. The DNA can be in a 
single -stranded or double -stranded form. If the fragment is 
in single -stranded form, it can be converted to double- 
stranded form using DNA polymerase according to conventional 
techniques. 

The DNA sequence of the invention can be linked to a 
structural gene. As used herein, the term "structural gene" 
refers to a DNA sequence that encodes through its template or 
messenger mRNA a sequence of amino acids characteristic of a 
specific protein or polypeptide. The nucleotide sequence of 
the invention can function with an expression control se- 
quence, that is, a DNA sequence that controls and regulates 
expression of the gene when operatively linked to the gene. 

4 . ' Vectors Containing the Nucleotide 
Sequence of the Invention 

This invention also relates to cloning and expression 
vectors containing the DNA sequence of the invention coding 
for the enzyme I-Scel. 

More particularly, the DNA sequence encoding the enzyme 
can be ligated to a vehicle for cloning the sequence. The 
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major steps involved in gene cloning comprise procedures for 
separating DNA containing the gene of interest from prokary- 
otes or eukaryotes, cutting the resulting DNA fragment and 
the DNA from a cloning vehicle at specific sites, mixing the 
two DNA fragments together, and ligating the fragments to 
yield a recombinant DNA molecule. The recombinant molecule 
can then be transferred into a host cell, and the cells al- 
lowed to replicate to produce identical cells containing 
clones of the original DNA sequence. 

The vehicle employed in this invention can be any 
double -stranded DNA molecule capable of transporting the 
nucleotide sequence of the invention into a host cell and 
capable of replicating within the cell. More particularly, 
the vehicle must contain at least one DNA sequence that can 
act as the origin of replication in the host cell. In addi- 
tion, the vehicle must contain two or more sites for inser- 
tion of the DNA sequence encoding the gene of the invention. 
These sites will ordinarily correspond to restriction enzyme 
sites at which cohesive ends can be formed, and which are 
complementary to the cohesive ends on the promoter sequence 
to be ligated to the vehicle. In general, this invention can 
be carried out with plasmid, bacteriophage, or cosmid 
vehicles having these characteristics. 

The nucleotide sequence of the invention can have cohe- 
sive ends compatible with any combination of sites in the 
vehicle. Alternatively, the sequence can have one or more 
blunt ends that can be ligated to corresponding blunt ends in 
the cloning sites of the vehicle. The nucleotide sequence to 
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be ligated can be further processed, if desired, by succes- 
sive exonuclease deletion, such as with the enzyme Bal 31. 
In the event that the nucleotide sequence of the invention 
does not contain a desired combination of cohesive ends, the 
sequence can be modified by adding a linker, an adaptor, or 
homopolymer tailing. 

It is preferred that plasmids used for cloning nucle- 
otide sequences of the invention carry one or more genes re- 
sponsible for a useful characteristic, such as a selectable 
marker, displayed by the host cell. In a preferred strategy, 
plasmids having genes for resistance to two different drugs 
are chosen. For example, insertion of the DNA sequence into 
a gene for an antibiotic inactivates the gene and destroys 
drug resistance. The second drug resistance gene is not af- 
fected when cells are transformed with the recombinants, and 
colonies containing the gene of interest can be selected by 
resistance to the second drug and susceptibility to the first 
drug. Preferred antibiotic markers are genes imparting 
chloramphenicol, ampicillin, or tetracycline resistance to 
the host cell. 

A variety of restriction enzymes can be used to cut the 
vehicle. The identity of the restriction enzyme will gener- 
ally depend upon the identity of the ends on the DNA sequence 
to be ligated and the restriction sites in the vehicle. The 
restriction enzyme is matched to the restriction sites in the 
vehicle, which in turn is matched to the ends on the nucleic 
acid fragment being ligated. 
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The ligation reaction can be set up using well known 
techniques and conventional reagents. Ligation is carried 
out with a DNA ligase that catalyzes the formation of 
phosphodiester bonds between adjacent 5' -phosphate and the 
free 3' -hydroxy groups in DNA duplexes. The DNA ligase can 
be derived from a variety of microorganisms. The preferred 
DNA ligases are enzymes from E. coli and bacteriophage T4 . 
T4 DNA ligase can ligate DNA fragments with blunt or sticky 
ends, such as those generated by restriction enzyme diges- 
tion. E. coli DNA ligase can be used to catalyze the forma- 
tion of phosphodiester bonds between the termini of duplex 
DNA molecules containing cohesive ends . 

Cloning can be carried out in prokaryotic or eukaryotic 
cells. The host for replicating the cloning vehicle will of 
course be one that is compatible with the vehicle and in 
which the vehicle can replicate. When a plasmid is employed, 
the plasmid can be derived from bacteria or some other organ- 
ism or the plasmid can be synthetically prepared. The plas- 
mid can replicate independently of the host cell chromosome 
or an integrative plasmid (episome) can be employed. The 
plasmid can make use of the DNA replicative enzymes of the 
host cell in order to replicate or the plasmid can carry 
genes that code for the enzymes required for plasmid replica- 
tion. A number of different plasmids can be employed in 
practicing this invention. 

The DNA sequence of the invention encoding the enzyme I- 
Scel can also be ligated to a vehicle to form an expression 
vector. The vehicle employed in this case is one in which it 
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is possible to express the gene operatively linked to a pro- 
moter in an appropriate host cell. It is preferable to em- 
ploy a vehicle known for use in expressing genes in E. coli, 
yeast, or mammalian cells. These vehicles include, for ex- 
ample, the following £. coli expression vectors: 
pSCM525, which is an E. coli expression vector derived from 
pUC12 by insertion of a tac promoter and the synthetic 
gene for I-Scel. Expression is induced by IPTG. 
pGEXct>6, which is an E. coli expression vector derived from 

pGEX in which the synthetic gene from pSCM525 for I-£ceI 
is fused with the glutathione S transferase gene, 
producing a hybrid protein. The hybrid protein pos- 
sesses the endonuclease activity. 
pDIC73, which is an E. coli expression vector derived from 
pET-3C by insertion of the synthetic gene for I-Scel 
(Ndel - BamHI fragment of pSCM525) under T7 promoter 
control. This vector is used in strain BL21 (DE3) which 
expresses the T7 RNA polymerase under IPTG induction. 
pSCM351, which is an E. coli expression vector derived from 
pUR291 in which the synthetic gene for I-Scel is fused 
with the Lac Z gene, producing a hybrid protein. 
pSCM353, which is an E. coli expression vector derived from 
pEXl in which the synthetic gene for I-Scel is fused 
with the Cro/Lac Z gene, producing a hybrid protein. 

Examples of yeast expression vectors are: 
pPEX7, which is a yeast expression vector derived from 
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pRP51-Bam 0 (a LEU2d derivative of pLG-SD5) by insertion 
of the synthetic gene under the control of the galactose 
promoter. Expression is induced by galactose. 

pPEX408, which is a yeast expression vector derived from 
pLG-SD5 by insertion of the synthetic gene under the 
control of the galactose promoter. Expression is in- 
duced by galactose. 

Several yeast expression vectors are depicted in Fig. 7. 

Typical mammalian expression vectors are: 
pRSV I-Scel, which is a pRSV derivative in which the 

synthetic gene (BairiHI - PstI fragment from pSCM525) is 
under the control of the LTR promoter of Rous Sarcoma 
Virus. This expression vector is depicted in Fig. 8. 
Vectors for expression in Chinese Hamster Ovary (CHO) cells 
can also be employed. 

5 . Cells Transformed with Vectors of the Invention 

The vectors of the invention can be inserted into host 
organisms using conventional techniques. For example, the 
vectors can be inserted by transformation, transf ection, 
electroporation, microinjection, or by means of liposomes 
(lipof ection) . 

Cloning can be carried out in prokaryotic or eukaryotic 
cells. The host for replicating the cloning vehicle will of 
course be one that is compatible with the vehicle and in 
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which the vehicle can replicate. Cloning is preferably car- 
ried out in bacterial or yeast cells, although cells of fun- 
gal, animal, and plant origin can also be employed. The pre- 
ferred host cells for conducting cloning work are bacterial 
cells, such as E. coli. The use of E. coli cells is par- 
ticularly preferred because most cloning vehicles, such as 
bacterial plasmids and bacteriophages, replicate in these 
cells . 

In a preferred embodiment of this invention, an expres- 
sion vector containing the DNA sequence encoding the nucle- 
otide sequence of the invention operatively linked to a pro- 
moter is inserted into a mammalian cell using conventional 
techniques. 

Application of I-Scel for large scale mapping 

1 . Occurrence of natural sites in various genomes 
Using the purified I-Scel enzyme, the occurrence of 

natural or degenerate sites has been examined on the complete 
genomes of several species. No natural site was found in 
Saccharomyces cerevisiae, Bacillus anthracis , Borrelia 
burgdorferi, Leptospira biflexa and L. interrogans. One de- 
generate site was found on T7 phage DNA. 

2 . Insertion of artificial sites 

Given the absence of natural I-Scel sites, artificial 
sites can be introduced by transformation or transf ection. 
Two cases need to be distinguished: site-directed integra- 
tion by homologous recombination and random integration by 
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non- homologous recombination, transposon movement or 
retroviral infection. The first is easy in the case of yeast 
and a few bacterial species, more difficult for higher eu- 
caryotes. The second is possible in all systems. 

3 . Insertion vectors 

Two types can be distinguished: 

-1- Site specific cassettes that introduce the I-Scel 
site together with a selectable marker. 

For yeast: all are pAFlOO derivatives (Thierry et al. (1990) 
YEAST 6:521-534) containing the following marker genes: 



pAFlOl 
pAF103 
pAF104 
pAF105 
pAF106 
pAF107 



URA3 (inserted in the Hindi 1 1 site) 

Neo (inserted in Bglll site) 

HIS3 (inserted in Bglll site) 

Kan (inserted in Bglll site) 

Kan (inserted in Bglll site) 

LYS2 (inserted between Hindi I I and EcoR V) 



A restriction map of the plasmid pAFlOO is shown in Fig. 9. 
The nucleotide sequence and restriction sites of regions of 
plasmid pAFlOO are shown in Figs. 10A and 10B. 
Many transgenic yeast strains with the I-Scel site at various 
and known places along chromosomes are available. 

-2- Vectors derived from transposable elements or 
retroviruses . 
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For E. coli and other bacteria: mini Tn5 derivatives con- 
taining the I-Scel site and 
pTSm a> Str R 

pTKm o) Kan R (See Fig. 11) 

R 

pTTc a) Tet 

For yeast: pTyco6 is a pD123 derivative in which the I-Scel 
site has been inserted in the LTR of the Ty element. 
(Fig. 12) 

For mammalian cells: 

pMLV LTR SAPLZ: containing the I-Scel site in the LTR of MLV 
and Phleo-LacZ (Fig. 13) . This vector is first grown in ¥2 
cells (3T3 derivative, from R. Mulligan) . Two transgenic 
cell lines with the I-Scel site at undetermined locations in 
the genome are available: 1009 (pluripotent nerve cells, 
J.F. Nicolas) and D3 (ES cells able to generate transgenic 
animals) . 

4 . The nested chromosomal fragmentation strategy 

The nested chromosomal fragmentation strategy for 
genetically mapping a eukaryotic genome exploits the unique 
properties of the restriction endonuclease T-Scel, such as an 
18 bp long recognition site. The absence of natural I-Scel 
recognition sites in most eukaryotic genomes is also 
exploited in this mapping strategy. 

First, one or more I-Scel recognition sites are 
artificially inserted at various positions in a genome, by 
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homologous recombination using specific cassettes containing 
selectable markers or by random insertion, as discussed 
supra . The genome of the resulting transgenic strain is then 
cleaved completely at the artificially inserted I-Scel 
site(s) upon incubation with the I-Scel restriction enzyme. 
The cleavage produces nested chromosomal fragments. 

The chromosomal fragments are then purified and 
separated by pulsed field gel (PFG) electrophoresis, allowing 
one to "map" the position of the inserted site in the 
* chromosome. If total DNA is cleaved with the restriction 
enzyme, each artificially introduced I-Scel site provides a 
unique "molecular milestone" in the genome. Thus, a set of 
transgenic strains, each carrying a single I-Scel site, can 
be created which defines physical genomic intervals between 
the milestones. Consequently, an entire genome, a chromosome 
or any segment of interest can be mapped using artificially 
introduced I-Scel restriction sites. 

The nested chromosomal fragments may be transferred to a 
solid membrane and hybridized to a labelled probe containing 
DNA complementary to the DNA of the fragments. Based on the 
hybridization banding patterns that are observed, the 
eukaryotic genome may be mapped. The set of transgenic 
strains with appropriate "milestones" is used as a reference 
to map any new gene or clone by direct hybridization. 
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Example 1 : Application of the Nested Chromosomal 
Fragmentation Strategy to the Mapping of Yeast Chromosome XI 

This strategy has been applied to the mapping of yeast 

chromosome XI of Saccharamyces cerevisiae. The I-Scel site 

was inserted at 7 different locations along chromosome XI of 

the diploid strain FY3,679, hence defining eight physical 

intervals in that chromosome. Sites were inserted from a 

URA3-1-I-Scel cassette by homologous recombination. Two 

sites were inserted within genetically defined genes, TIF1 

and FAS1, the others were inserted at unknown positions in 

the chromosome from five non-overlapping cosmids of our 

library, taken at random. Agarose embedded DNA of each of 

the seven transgenic strains was then digested with I-Scel 

and analyzed by pulsed field gel electrophoresis (Fig. 14A) . 

The position of the I-Scel site of each transgenic strain in 

chromosome XI is first deduced from the fragment sizes 

without consideration of the left/right orientation of the 

fragments. Orientation was determined as follows. The most 

telomere proximal I-Scel site from this set of strains is in 

the transgenic E40 because the 50 kb fragment is the shortest 

of all fragments (Fig. 15A) . Therefore, the cosmid clone 

pUKGO40, which was used to insert the I-Scel site in the 

transgenic E40, is now used as a probe against all chromosome 

fragments (Fig. 14B) . As expected, pUKG040 lights up the two 

fragments from strain E40 (50 kb and 630 kb, respectively) . 

The large fragment is close to the entire chromosome XI and 

shows a weak hybridization signal due to the fact that the 

insert of pUKG040, which is 38 kb long, contains less than 4 
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kb within the large chromosome fragment . Note that the 
entire chromosome XI remains visible after I-Scel digestion, 
due to the fact that the transgenic strains are diploids in 
which the I-Scel site is inserted in only one of the two 
homologs. Now, the pUKG040 probe hybridizes to only one 
fragment of all other transgenic strains allowing unambiguous 
left/right orientation of I-Scel sites (See Fig. 15B) . No 
significant cross hybridization between the cosmid vector and 
the chromosome subfragment containing the I-Scel site 
insertion vector is visible. Transgenic strains can now be 
ordered such that I-Scel sites are located at increasing 
distances from the hybridizing end of the chromosome 
(Fig. 15C) and the I-Scel map can be deduced (Fig. 15D) . 
Precision of the mapping depends upon PFGE resolution and 
optimal calibration. Note that actual left/right orientation 
of the chromosome with respect to the genetic map is not 
known at this step. To help visualize our strategy and to 
obtain more precise measurements , of the interval sizes 
between I-Scel sites between I-Scel, a new pulsed field gel 
electrophoresis with the same transgenic strains now placed 
in order was made (Fig. 16) . After transfer, the fragments 
were hybridized successively with cosmids pUKG040 and pUKG06S 
which light up, respectively, all fragments from the opposite 
ends of the chromosome (clone pUKG066 defines the right end 
of the chromosome as defined from the genetic map because it 
contains the SIR1 gene. A regular stepwise progression of 
chromosome fragment sizes is observed. Note some cross 
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hybridization between the probe pUKG066 and chromosome III, 
probably due to some repetitive DNA sequences. 

All chromosome fragments, taken together, now define 
physical intervals as indicated in Fig. I5d. The I-£ceI map 
obtained has an 80 kb average resolution. 

Example 2 : Application of the Nested Chromosomal 
Fragmentation Strategy to the Mapping of Yeast Artificial 
Chromosome (YAC) Clones 

This strategy can be applied to YAC mapping with two 
possibilities. 

-1- insertion of the I-Scel site within the gene of 
interest using homologous recombination in yeast. This per- 
mits mapping of that gene in the YAC insert by I-Scel diges- 
tion in vitro. This has been done and works. 

-2- random integration of I-Scel sites along the YAC 
insert by homologous recombination in yeast using highly re- 
petitive sequences (e.g., B2 in mouse or Alu in human) . 
Transgenic strains are then used as described in ref . PI to 
sort libraries or map genes. 

The procedure has now been extended to YAC containing 
450 kb of Mouse DNA. To this end, a repeated sequence of 
mouse DNA (called B2) has been inserted in a plasmid 
containing the I-Scel site and a selectable yeast marker 
(LYS2) . Transformation of the yeast cells containing the 
recombinant YAC with the plasmid linearized within the B2 
sequence resulted in the integration of the I-Scel site at 
five different locations distributed along the mouse DNA 
insert. Cleavage at the inserted I-Scel sites using the 



( 



- 43 - 



enzyme has been successful, producing nested fragments that 
can be purified after electrophoresis. Subsequent steps of 
the protocol exactly parallels the procedure described in 
Example 1 . 

Example 3: Application of Nested Chromosomal Fragments to the 
Direct Sorting of Cosmid Libraries 

The nested, chromosomal fragments can be purified from 

preparative PFG and used as probes against clones from a 

chromosome XI specific sublibrary. This sublibrary is 

composed of 138 cosmid clones (corresponding to eight times 

coverage) which have been previously sorted from our complete 

yeast genomic libraries by colony hybridization with PFG 

purified chromosome XI. This collection of unordered clones 

has been sequentially hybridized with chromosome fragments 

taken in order of increasing sizes from the left end of the 

chromosome. Localization of each cosmid clone on the I-Scel 

map could be unambiguously determined from such 

hybridizations. To further verify the results and to provide 

a more precise map, a subset of all cosmid clones, now 

placed in order, have been digested with EcoRI, 

electrophoresed and hybridized with the nested series of 

chromosome fragments in order of increasing sizes from the 

left end of the chromosome. Results are given in Figure 17. 

For a given probe, two cases can be distinguished: 

cosmid clones in which all EcoRI fragments hybridize with the 

probe and cosmid clones in which only some of the EcoRI 

fragments hybridize (i.e., compare pEKGlOO to pEKG098 in Fig. 
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17b) . The first category corresponds to clones in which the 
insert is entirely included in one of the two chromosome 
fragments, the second to clones in which the insert overlaps 
an I-Scel site. Note that, for clones of the pEKG series, 
the EcoRI fragment of 8 kb is entirely composed of vector 
sequences (pWE15) that do not hybridize with the chromosome 
fragments. In the case where the chromosome fragment 
possesses the integration vector, a weak cross hybridization 
with the cosmid is observed (Fig. 17e) . 

Examination of Fig. 17 shows that the cosmid clones can 
unambiguously be ordered with respect to the I-Scel map 
(Fig. 13E) , each clone falling either in a defined interval 
or across an I-Scel site. In addition, clones from the 
second category allow us to place some EcoRI fragments on the 
I-Scel maps, while others remain unordered. The complete set 
of chromosome XI- specific cosmid clones, covering altogether 
eight times the equivalent of the chromosome, has been sorted 
with respect to the I-Scel map, as shown in Fig. 18. 

5 . Partial restriction mapping using I-Scel 

In this embodiment, complete digestion of the DNA at the 
artificially inserted I-Scel site is followed by partial 
digestion with bacterial restriction endonucleases of choice. 
The restriction fragments are then separated by 
electrophoresis and blotted. Indirect end labelling is 
accomplished using left or right I -See half sites. This 
technique has been successful with yeast chromosomes and 
should be applicable without difficulty for YAC. 
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Partial restriction mapping has been done on yeast DNA 
and on mammalian cell DNA using the commercial enzyme I-Scel. 
DNA from cells containing an artificially inserted I-Scel 
site is first cleaved to completion by I-Scel. The DNA is 
then treated under partial cleavage conditions with bacterial 
restriction endonucleases of interest (e.g., BamHI) and 
electrophoresed along with size calibration markers. The DNA 
is transferred to a membrane and hybridized successively us- 
ing the short sequences flanking the I-Scel sites on either 
side (these sequences are known because they are part of the 
original insertion vector that was used to introduce the 
I-Scel site) . Autoradiography (or other equivalent detection 
system using non radioactive probes) permit the visualization 
of ladders, which directly represent the succession of the 
bacterial restriction endonuclease sites from the I-Scel 
site. The size of each band of the ladder is used to 
calculate the physical distance between the successive 
bacterial restriction endonuclease sites. 

Application of I-Scel for In Vivo 
Site Directed Recombination 

1 . Expression of I-Scel in veast 

The synthetic I-5ceI gene -has been placed under the 
control of a galactose inducible promoter on multicopy 
plasmids pPEX7 and pPEX408. Expression is correct and 
induces effects on site as indicated below. A transgenic 
yeast with the I-Scel synthetic gene inserted in a chromosome 
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under the control of an inducible promoter can be 
constructed. 

2. Effects of site specific double strand breaks in 
veast (refs. 18 and P4) 

Effects on plasmid-borne I-Scel sites: 

Intramolecular effects are described in detail in 

Ref. 18. Intermolecular (plasmid to chromosome) 

recombination can be predicted. 

Effects on chromosome integrated I-Scel sites 
In a haploid cell, a single break within a chromosome at 
an artificial I-Scel site results in cell division arrest 
followed by death (only a few % of survival) . Presence of an 
intact sequence homologous to the cut site results in repair 
and 100% cell survival. In a diploid cell, a single break 
within a chromosome at an artificial I-Scel site results in 
repair using the chromosome homolog and 100% cell survival. 
In both cases, repair of the induced double strand break re- 
sults in loss of heterozygosity with deletion of the non ho- 
mologous sequences flanking the cut and insertion of the non 
homologous sequences from the donor DNA molecule. 

3 • Application for in vivo recombination YACs in Yeast 

Construction of a YAC vector with the I-Scel restriction 
site next to the cloning site should permit one to induce 
homologous recombination with another YAC if inserts are 
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partially overlapping. This is useful for the construction 
of contigs. 

4 . Prospects for other organisms 

Insertion of an I-Scel restriction site has been done 
for bacteria {E. coli, Yersinia entorocolitica, Y. pestis, Y. 
pseudotuberculosis) , and mouse cells. Cleavage at the 
artificial I-Scel site in vitro has been successful with DNA 
from the transgenic mouse cells. Expression of I-Scel from 
the synthetic gene in mammalian or plant cells should be 
successful. 

The I-Scel site has been introduced in mouse cells and 
bacterial cells as follows: 
-1- Mouse cells: 

-a- Mouse cells {\j/2) were transfected with the DNA 
of the vector pMLV LTR SAPLZ containing the I-Scel site using 
standard calcium phosphate transfection technique. 

-b- Transfected cells were selected in DMEM medium 
containing phleomycin with 5% fetal calf serum and grown un- 
der 12% C0 2 , 100% humidity at 37°C until they form colonies. 

-c- Phleomycin resistant colonies were subcloned 
once' in the same medium. 

-d- Clone MLOP014, which gave a titer of 10 5 virus 
particles per ml, was chosen. This clone was deposited at 
C.N. CM. on May 5, 1992 under culture collection accession 
No. 1-1207. 

-e- The supernatant of this clone was used to infect 
other mouse cells (1009) by spreading 10 5 virus particles on 
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10 5 cells in DMEM medium with 10% fetal calf serum and 5 mg/ 
ml of "polybrain" . Medium was replaced 6 hours after 
infection by the same fresh medium. 

-f- 24 hours after infection, phleomycin resistant 
cells were selected in the same medium as above. 

-g- phleomycin resistant colonies were subcloned 
once in the same medium. 

-h- one clone was picked and analyzed. DNA was 
purified with standard procedures and digested with I-Scel 
under optimal conditions. 
-2- Bacterial cells: 

Mini Tn 5 transposons containing the I-Scel 
recognition site were constructed in E. coli by standard 
recombinant DNA procedures. The mini Tn 5 transposons are 
carried on a conjugative plasmid. Bacterial conjugation 
between E. coli and Yersinia is used to integrate the mini Tn 
5 transposon in Yersinia. Yersinia cells resistant to 
Kanamycin, Streptomycin or tetracycline are selected (vectors 
pTKM-w, pTSM-o) and pTTc-o>, respectively) . 

Several strategies can be attempted for the site spe- 
cific insertion of a DNA fragment from a plasmid into a chro- 
mosome. This will make it possible to insert transgenes at 
predetermined sites without laborious screening steps. 
Strategies are: 

-1- Construction of a transgenic cell in which the 
I-Scel recognition site is inserted at a unique location in a 
chromosome. Cotransf ormation of the transgenic cell with the 
expression vector and a plasmid containing the gene of 



- 49 - 



interest and a segment homologous to the sequence in which 
the I-Scel site is inserted. 

-2- Insertion of the I-Scel recognition site next to or 
within the gene of interest carried on a plasmid. 
Cotransformation of a normal cell with the expression vector 
carrying the synthetic I-Scel gene and the plasmid containing 
the I-Scel recognition site. 

-3- Construction of a stable transgenic cell line in 
which the I-Scel gene has been integrated in the genome under 
the control of an inducible or constitutive cellular pro- 
moter. Transformation of the cell line by a plasmid contain- 
ing the I-Scel site next to or within the gene of interest. 

Site directed homologous recombination : diagrams of 
successful experiments performed in yeast are given in 
Fig. 19. 
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Induction of homologous recombination in mammalian 
chromosomes using the I-Sce I system of Saccharomyces 

cerevisiae 

Example 4 

INTRODUCTION 

Homologous recombination (HR) between chromosomal and 
exogenous DNA is at the basis of methods for introducing genetic 
changes into the genome (5B, 2 OB) . Parameters of the 
recombination mechanism have been determined by studying plasmid 
sequences introduced into cells (IB, 4B, 10B, 12B) and in in vitro 
system (8B) . HR is inefficient in mammalian cells but is promoted 
by double- strand breaks in DNA. 

So far, it has not been possible to cleave a specific 
chromosomal target efficiently, thus limiting our understanding of 
recombination and its exploitation. Among endonucleases, the 
Saccharomyces cerevisiae mitochondrial endonuclease I- See I (6B) 
has characteristics which can be exploited as a tool for cleaving 
a specific chromosomal target and, therefore, manipulating the 
chromosome in living organisms. I-Sce I protein is an 
endonuclease responsible for intron homing in mitochondria of 
yeast, a non-reciprocal mechanism by which a predetermined 
sequence becomes inserted at a predetermined site. It has been 
established that endonuclease I -See I can catalyze recombination 
in the nucleus of yeast by initiating a double-strand break (17B) . 
The recognition site of endonuclease I -See I is 18 bp long, 
therefore, the I-Sce I protein is a very rare cutting restriction 
endonuclease in genomes (22B) . In addition, as the I-Sce I 
protein is not a recombinase, its potential for chromosome 
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engineering is larger than that of systems with target sites 
requirement on both host and donor molecules (9B) . 

We demonstrate here that the yeast I -See I endonuclease can 
efficiently induce double-strand breaks in chromosomal target in 
mammalian cells and that the breaks can be repaired using a donor 
molecule that shares homology with the regions flanking the break. 
The enzyme catalyzes recombination at a high efficiency. This 
demonstrates that recombination between chromosomal DNA and 
exogenous DNA can occur in mammalian cells by the double-strand 
break repair pathway (2 IB) . 

MATERIALS AND METHODS 
Plasmid construction 

pG-MPL was obtained in four steps: (I) insertion of the 
0.3 kb Bgl II - Sma I fragment (treated with Klenow enzyme) of the 
Moloney Murine Leukemia Virus (MoMuLV) env gene (25B) containing 
SA between the Nhe I and Xba I sites (treated with Klenow enzyme) , 
in the U3 sequence of the 3 ' LTR of MoMuLV, in an intermediate 
plasmid. (II) insertion in this modified LTR with linkers 
adaptors of the 3.5 kb Nco I - Xho I fragment containing the 
PhleoLacZ fusion gene (15B) (from pUT65 from Cayla laboratory) at 
the Xba I' site next to SA. (Ill) insertion of this 3 ' LTR 
(containing SA and PhleoLacZ) , recovered by Sal I - EcoR I double 
digestion in pS'LTR plasmid (a plasmid containing the 5 'LTR to the 
nucleotide number 563 of MoMuLV (26B) between the Xho I and the 
EcoR I sites, and (VI) insertion of a synthetic I-Sce I 
recognition site into the Nco I site in the 3 'LTR (between SA and 
PhleoLacZ) . 
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pG-MtkPl was obtained by the insertion (antisense to the 
retroviral genome) of the 1.6 kb tk gene with its promoter with 
linker adaptators at the Pst I site of pG-MPL. pVRneo was 
obtained in two steps (I) insertion into pSP65 (from Promega) 
linearized by Pst I - EcoR I double digestion of the 4.5 kb Pst I 
to EcoR I fragment of pG-MPL containing the 3'LTR with the SA and 
PhleoLacZ, (II) insertion of the 2,0 kb Bgl II - BamH I fragment, 
(treated with Klenow enzyme) containing neoPolyA from pRSVneo into 
the Nco I restriction site (treated with Klenow enzyme) of pSP65 
containing part of the 3'LTR of G-MPL (between SA and PhleoLacZ) . 

pCMV(I-Sce 1+) was obtained in two steps: (I) insertion of 
the 0,73 kb BamH I - Sal I, I-Sce I containing fragment (from 
pSCM525, A. Thierry, personal gift) into the phCMVl (F. Meyer, 
personal gift) plasmid cleaved at the BamH I and the Sal I sites, 
(II) insertion of a 1.6 kb (nucleotide number 3204 to 1988 in 
SV40) fragment containing the polyadenylation signal of SV40 into 
the Pst I site of phCMVl. 

pCMV(I-Sce I-) contains the I -See I ORF in reverse 
orientation in the pCMV(I-Sce 1+) plasmid. It has been obtained 
by inserting the BamH I - Pst I I -See I ORF fragment (treated with 
Klenow enzyme) into the phCMV PolyA vector linearized , by Nsi I and 
Sal I double -digestion and treated with Klenow enzyme. 

Plasmids pG-MPL, pG-MtkPl, pG-MtkAPAPL have been described. 
In addition to the plasmids described above, any kind of plasmid 
vector can be constructed containing various promoters, genes, 
polyA site, I -See I site. 
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Cell culture and selection 

3T3, PCC7 S, $ 2 are referenced in (7B) and (13B) . Cell 
selection medium: gancyclovir (14B, 23B) was added into the tissue 
culture medium at the concentration of 2/iM. Gancyclovir 
selection was maintained on cells during 6 days, G418 was added 
into the appropriate medium at a concentration of 1 mg/ml for 
PCC7-S and 400 fig /ml for 3T3 . The selection was maintained 
during all the cell culture. Phleomycin was used at a 
concentration of 10 jig/ml . 
Cell lines 

- ^ cell line was transfected with plasmids containing a proviral 
recombinant vector that contain I -See I recognition site: pG-MPL, 
pG-MtkPL, pG-Mtk ApA PL 

- NIH 3T3 Fibroblastic cell line is infected with: 

G-MPL. Multiple (more than 30) clones were recovered. The 
presence of 1 to 14 proviral integrations and the multiplicity of 
the different points of integration were verified by molecular 
analysis . 

G-MtkPL. 4 clones were recovered (3 of them have one normal 
proviral integration and 1 of them have a recombination between 
the two LTR so present only one I -See I recognition site) . 

- Embryonal carcinoma PCC7-S cell line is infected with: 

G-MPL, 14 clones were recovered, normal proviral 
integration. 

- Embryonic stem cell line D3 is infected with: 

G-MPL • 4 clones were recovered (3 have normal proviral 
integration, 1 has 4 proviral integrations) . 
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"Prepared" mouse cells; 

Insertion of the retrovirus (proviral integration) induces 

duplication of LTR containing the I-Sce I site. The cell is 

heterozygotic for the site. 

Transf ection, infection, cell staining 
and nucleic acids blot analysis 

These procedures were performed as described in (2B, 3B) . 
RESULTS 

To detect I -See I HR we have designed the experimental system 
shown in Fig. 20. Defective recombinant retroviruses (24B) were 
constructed with the I -See I recognition site and a PhleoLacZ 
(15B) fusion gene inserted in their 3 'LTR (Fig. 20a) . Retroviral 
integration results in two I-Sce I sites distant of 5.8 kb or 
7.2 kb from each other into the cell genome (Fig. 20b) . We 
hypothesized that I-Sce I-induced double-strand breaks (DSB) at 
these sites (Fig. 20c) could initiate HR with a donor plasmid 
(pVRneo, Fig. 20d) containing sequences homologous to the flanking 
regions of the DSBs and that non- homologous sequences, carried by 
the donor plasmid, could be copied during this recombination 
(Fig. 20e) . 

Introduction of duplicated I-Sce I recognition sites into the 
genome of mammalian cells by retrovirus integration 

More specifically, two proviral sequences were used in these 

studies. The G-MtkPL proviral sequences (from G-MtkPL virus) 

contain the PhleoLacZ fusion gene for positive selection of 

transduced cells (in phleomycine-containing medium) and the tk 

gene for negative selection (in gancyclovir-containing medium) . 
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The G-MPL proviral sequences (from G-MPL virus) contain only the 
PhleoLacZ sequences. G-MtkPL and G-MPL are defective recombinant 
retroviruses (16B) constructed from an enhancerless Moloney murine 
leukemia provirus. The virus vector functions as a promoter trap 
and therefore is activated by flanking cellular promoters. 

Virus -producing cell lines were generated by transfecting pG- 
MtkPL or G-MPL into the ^-2 package cell line (13B) . Northern 
blot analysis of viral transcripts shows (Fig. 21) that the 
^-2-G-MPL line expresses 4.2 and 5.8 kb transcripts that 
hybridized with LacZ probes. These transcripts probably initiate 
in the 5'LTR and terminate in the 3'LTR. The 4.5 kb transcript 
corresponds to the spliced message and the 5.8 kb transcripts to 
the unspliced genomic message (Fig. 21. A) . This verified the 
functionality of the 5'LTR and of the splice donor and acceptor in 
the virus. Similar results have been obtained with i£-2G-MtkPL. 
Virus was prepared from the culture medium of \p-2 cell lines. 

NIH3T3 fibroblasts and PCC7-S multipotent mouse cell lines 
(7B) were next infected by G-MtkPL and G-MPL respectively, and 
clones were isolated. Southern blot analysis of the DNA prepared 
from the clones demonstrated LTR-mediated duplication of I- See I 
PhleoLacZ sequences (Fig. 22. a). Bel I digestion generated the 
expected 5.8 kb (G-MPL) or 7.2 kb (G-MtkPL) fragments. The 
presence of two additional fragments corresponding to Bel I sites 
in the flanking chromosomal DNA demonstrates a single proviral 
target in each clone isolated. Their variable size from clone to 
clone indicates integration of retroviruses at distinct loci. 
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That I-Sce I recognition sites have been faithfully duplicated was 
shown by I -See I digests which generated 5.8 kb (G-MPL) fragments 
or 7.2 kb (G-MtkPL) (Fig. 22. b) 

Induction by I -See I of recombination leading to DNA exchange 

The phenotype conferred to the NIH3T3 cells by G-MtkPL virus 
is phleo R /3-gal + gls S and to PCC7-S by G-MPL is phleo R 0-gal + 
(Fig. 20b) . To allow for direct selection of recombination events 
induced by I-Sce I we constructed pVRneo donor plasmid. In pVRneo 
the neo gene is flanked by 300 bp homologous to sequences 5' to 
the left chromosomal break and 2.5 kb homologous to sequences 3' 
to the right break (Fig. 20d) . A polyadenylation signal was 
positioned 3' to the neo gene to interrupt the PhleoLacZ message 
following recombination. If an induced recombination between the 
provirus and the plasmid occurs, the resulting phenotype will be 
neo and due to the presence of a polyadenylation signal in the 
donor plasmid the PhleoLacZ gene should not be expressed, 
resulting in a phleo /3-gal~ phenotype. 

With G-MtkPL and G-MtkDPQPL, it is possible to select 
simultaneously for the gap by negative selection with the tk gene 
(with gancyclovir) and for the exchange of the donor plasmid with 
positive selection with the neo gene (with geneticine) . With G- 
MPL only the positive selection can be applied in medium 
containing geneticine. Therefore, we expected to select for both 
the HR and for an integration event of the donor plasmid near an 
active endogenous promoter. These two events can be distinguished 
as an induced HR results in a neo R 0-gal" phenotype and a random 
integration of the donor plasmid results in a neo R )S-gal + 
phenotype . 



r 
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Two different NIH3T3/G-MtkPL and three different PCC7S/G-MPL 
clones were then co-transf ected with an expression vector for I- 
Sce I, pCMV(I-Sce 1+) , and the donor plasmid, pVRneo. Transient 
expression of I -See I may result in DSBs at I-Sce I sites, 
therefore promoting HR with pVRneo. The control is the co- 
transf ect ion with a plasmid which does not express I -See I, 
pCMV( I -See and pVRneo. 

NIH3T3 / G-MtkPL clones were selected either for loss of 

proviral sequences and acquisition of the neo R phenotype (with 

gancyclovir and geneticine) or for neo R phenotype only (Table 1) . 

R R 

In the first case, neo gls colonies were recovered with a 

-4 

frequency of 10 in experimental series, and no colonies were 
recovered in the control series. In addition, all neo R gls R 
colonies were /?-gal~, consistent with their resulting from HR at 
the proviral site. In the second case, neo colonies were 
recovered with a frequency of 10 in experimental series, and 
with a 10 to 100 fold lower frequency in the control series. In 
addition, 90% of the neo R colonies were found to be jS-gal" (in 
series with pCMV(I-Sce 1+)). This shows that expression of 
I -See I induces HR between pVR neo and the proviral site and that 
site directed HR is ten times more frequent than random 
integration of pVR neo near a cellular promoter, and at least 500 
times more frequent than spontaneous HR. 
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Table 1. Induced homologous recombination with I-Sce I 
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15 
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19 
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SsHR 


0 
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4 
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Del 


0 




0 
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TABLE 1: Effect of I- See I mediated double-strand cleavage. A. 
10 6 cells of NIH3T3/G-MtkPL clones 1 and 2 and 5.10 6 cells of 
PCC7 -S/G-MPL clones 3 to 5 were co-transf ected with pVRneo and 
either pCMV(I-Sce 1+) or pCMV(I-Sce I-) . Cells were selected in 
the indicated medium: Genet icin (G418) or genet icin + gancyclovir 
(G418_Gls) . The j3-gal expression phenotype was determined by X- 
gal histochemical staining. If an induced recombination between 
the provirus and pVRneo occurs, the cells acquire a neo jS-gal" 
phenotype. B. Molecular analysis of a sample of recombinant 
clones. RI: random integration of pVRneo, parental proviral 
structure. DsHR: double site HR. SsHR: single site HR. Del: 
deletion of the provirus (see also Fig. 20 and 23) . 



Verification of recombination by 
Southern and Northern blot analysis 

The molecular structure of neo R recombinants has been 

examined by Southern blot analysis (Fig. 23 and Table 1) . HR at 

I -See I sites predicts that digestion of recombinant DNA generates 

a 6.4 kb LacZ fragment instead of the 4.2kb parental fragment. 



- 68 - 

R R - 

All 15 neo gls j3-gal recombinants from NIH3T3 cells exhibited 
only the 6,4 kb Kpn I fragment. Therefore, the double selection 
procedure leads to only the expected recombinants created by gene 
replacement (Double Site Homologous Recombinants, DsHR) . 

The 25 j3-gal recombinants generated from the single 
selection fell into four classes: (a) DsHR induced by I -See I as 
above (19 clones) ; (b) integration of pVRneo in the left LTR as 
proven by the presence of a 4.2 Kpn I fragment (corresponding to 
PhleoLacZ in the remaining LTR), in addition to the 6.4 kb 
fragment (Fig. 23, Table 1, Single site Homologous Recombinants, 
SsHR; 3 independent /3-gal recombinants from clone 3) . These 
clones correspond to I- See I-IHR in left DSB only or (less likely) 
to double crossing over between LTR and pVRneo; (c) random pVRneo 
integrations (Table 1, Random Integrations, IR) and simultaneous 
HR (Table 1, Deletion, Del) (1 j8-gal~ recombinant) ; and (d) Random 
pVRneo integration and simultaneous deletion of provirus (1 /3- 
gal~ recombinant) . We suggest that this fourth class corresponds 
to repair of DSBs with the homologous chromosome. As expected, 
all j3-gal + recombinants from genet icin selection alone, 
correspond to random pVRneo integrations, whether they originate 
from the experimental series (eight clones analyzed) or from the 
control series (six clones analyzed) . 

We obtained additional evidence that recombination had 
occurred at the I-See I site of PCC7-S/G-MPL 1 by analyzing' the 
RNAs produced in the parental cells and in the recombinant 
(Fig. 24) . Parental PCC7-S/G-MPL 1 cells express a 7.0 kb LacZ 
RNA indicative of trapping of a cellular promoter leading to 
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expression of a cellular-viral fusion RNA. The recombinant clone 
does not express this LacZ RNA but expresses a neo RNA of 5,0 kb. 
The size of the neo RNA corresponds to the exact size expected for 
an accurate exchange of PhleoLacZ by neo gene and uses of the same 
cellular and viral splice site (viral PhleoLacZ RNA in the LTR is 
3.7 kb and neo RNA in pVRneo is 1.7 kb) . 

DISCUSSION 

The results presented here demonstrate that double-strand 
breaks can be induced by the I -See I system of Saccharomyces 
cerevisiae in mammalian cells, and that the breaks in the target 
chromosomal sequence induce site-specific recombination with input 
plasmidic donor DNA. 

To operate in mammalian cells, the system requires endogenous 
I-Sce I like activity to be absent from mammalian cells and I- 
Sce I protein to be neutral for mammalian cells. It is unlikely 
that endogenous I- See I-like actively operates in mammalian cells 
as the introduction of I -See I recognition sites do not appear to 
lead to rearrangement or mutation in the input DNA sequences. For 
instance, all NIH3T3 and PCC7-S clones infected with a 
retroviruses containing the I -See I restriction site stably 
propagated the virus. To test for the toxicity of I-Sce I gene 
product, an I -See I expressing plasmid was introduced into the 
NIH3T3 cells line (data not shown) . A very high percentage of 
cotransfer of a functional I -See I gene was found, suggesting no 
selection against this gene. Functionality of I-Sce I gene was 
demonstrated by analysis of transcription, by immunofluorescence 
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detection of the gene product and biological function (Choulika et 
al . in preparation) . 

We next tested whether the endonuclease would cleave a 
recognition site placed on a chromosome. This was accomplished by 
placing two I -See I recognition sites separated by 5.8 or 7,2 kb 
on a chromosome in each LTR of proviral structures and by 
analyzing the products of a recombination reaction with a 
targeting vector in the presence of the I -See I gene product. Our 
results indicate that in presence of I-Sce I, the donor vector 
recombines very efficiently with sequences within the two LTRs to 
produce a functional neo gene. This suggests that I-Sce I induced 
very efficiently double strand breaks in both I-Sce I sites. In 
addition, as double strand breaks were obtained with at least five 
distinct proviral insertions, the ability of I-Sce I protein to 
digest an I-Sce I recognition site is not highly dependent on 
surrounding structures . 

The demonstration of the ability of the I-Sce I meganuclease 
to have biological function on chromosomal sites in mammalian cell 
paves the ^oute for a number of manipulations of the genome in 
living organisms. In comparison with site-specific recombinases 
(9B, 18B) , the I-Sce I system is non-reversible. Site specific 
recombinases locate not only the sites for cutting the DNA, but 
also for rejoining by bringing together the two partners. In 
contrast, the only requirement with the I-Sce I system is homology 
of the donor molecule with the region flanking the break induced 
by I-Sce I protein. 

The results indicate for the first time that double strand 
DNA breaks in chromosomal targets stimulate HR with introduced DNA 
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in mammalian cells. Because we used a combination of double 
strand breaks (DSB) in chromosomal recipient DNA and super-coiled 
donor DNA, we explored the stimulation by I- See I endonuclease of 
recombination by the double strand break repair pathway (21B) . 
Therefore, the induced break is probably repaired by a gene 
conversion event involving the concerted participation of both 
broken ends which, after creation of single -stranded region by 5' 
to 3' exonucleolytic digestion, invade and copy DNA from the donor 
copy. However, a number of studies of recombination in mammalian 
cells and in yeast (10B, 11B, 19B) suggest that there is an 
alternative pathway of recombination termed single-strand 
annealing (SSA) . In the SSA pathway, double-strand breaks are 
substrates in the action of an exonuclease that exposes homologous 
complementary single-strand DNA on the recipient and donor DNA. 
Annealing of the complementary strand is then followed by a repair 
process that generates recombinants. The I-Sce I system can be 
used to evaluate the relative importance of the two pathways. 



Example 5 

This example describes the use of the I -See I meganuclease 
(involved in intron homing of mitochondria of the yeast 
Saccharomyces cerevisiae) (6B, 28B) to induce DSB and mediate 
recombination in mammalian cells. I -See I is a very rare-cutting 
restriction endonuclease, with an 18 bp long recognition site 
(29B, 22B) . In vivo, I -See I endonuclease can induce 
recombination in a modified yeast nucleus by initiating a specific 
DBS leading to gap repair by the cell (30B, 17B, 21B) • Therefore, 
this approach can potentially be used as a means of introducing 
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specific DSB in chromosomal target DNA with a view to manipulate 
chromosomes in living cells. The I -See I -mediated recombination 
is superior to recombinase system [11] for chromosome engineering 
since the latter requires the presence of target sites on both 
host and donor DNA molecules, leading to reaction that is 
reversible . 

The I-Sce I endonuclease expression includes recombination 
events. Thus, I -See I activity can provoke site -directed double 
strand breaks (DSBs) in a mammalian chromosome. At least two 
types of events occur in the repair of the DSBs, one leading to 
intra -chromosomal homologous recombination and the other to the 
deletion of the transgene. These I-Sce I-mediated recombinations 
occur at a frequency significantly higher than background. 

MATERIALS AND METHODS 

Plasmid construction 

pG-MtkPL was obtained in five steps: (I) insertion of the 0.3 
kbp Bgl II-Sma I fragment (treated with Klenow enzyme) of the 
Moloney Murine Leukemia Virus (MoMuLV) env gene (25B) containing a 
splice acceptor (SA) between the Nhe I and Xba I sites (treated 
with Klenow enzyme), in the U3 sequence of the 3'LTR of MoMuLV, in 
an intermediate plasmid. (II) Insertion in this modified LTR of a 
3.5 kbp Nco I~Xho I fragment containing the PhleoLacZ fusion gene 
[13] (from pUT65; Cayla Laboratory, Zone Commerciale du Gros, 
Toulouse, France) at the Xba I site next to SA. (Ill) Insertion 
of this 3'LTR (containing SA and PhleoLacZ), recovered by Sal I- 
EcoR I double digestion in the p5 ' LTR plasmid (a plasmid 
containing the 5 'LTR up to the nucleotide n° 563 of MoMuLV [12] ) 
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between the Xho I and the EcoR I site. (IV) Insertion of a 
synthetic I- See I recognition site into the Nco I site in the 
3'LTR (between SA and PhleoLacZ) , and (V) insertion (antisense to 
the retroviral genome) of the 1.6 kbp tk gene with its promoter 
with linker adaptators at the Pst I site of pG-MPL. 

pCMV(I-Sce 1+) was obtained in two steps: (I) insertion of 
the 0.73 kbp BamH I -Sal I, I -See I-containing fragment (from 
pSCM525, donated by A. Thierry) into the phCMVl (donated by 
F. Meyer) plasmid cleaved with BamH I and Sal I, (II) insertion of 
a 1.6 kbp fragment (nucleotide n° 3204 to 1988 in SV40) containing 
the polyadenylation signal of SV40 at the Pst I site of phCMVl. 

pCMV(I-Sce I-) contains the I -See I ORF in reverse 
orientation in the pCMV(I-Sce 1+) plasmid. It was obtained by 
inserting the BamH I- Pst I I -See I ORF fragment (treated with 
Klenow enzyme) into the phCMV PolyA vector linearized by Nsi I and 
Sal I double-digestion and treated with Klenow enzyme. 
Cell culture and selection 

T3 and \f/ 2 are referenced in (7B) and (13B) . Cell selection 
medium: gancyclovir (14B, 23B) was added into the tissue culture 
medium at the concentration of 2/iM. Gancyclovir selection was 
maintained for 6 days. Phleomycine was used at a concentration of 
10/xg/ml. Double selections were performed in the same 
conditions . 

Transf ection. infection, cell staining and nucleic acids blot 
analysis 

These protocols were performed as described in (2B, 3B) . 



- 74 - 



Virus -producing cell lines 

The virus -producing cell line is generated by transfecting 
pG-MtkPL into the ^-2 packaging cell line. Virus was prepared 
from the filtered culture medium of transfected ^-2 cell lines. 
NIH3T3 fibroblasts were infected by G-MtkPL, and clones were 
isolated in a Phleomycin-containing medium. 

RESULTS 

To assay for I -See I endonuclease activity in mammalian 
cells, NIH3T3 cells containing the G-MtkPL provirus were used. 
The G-MtkPL provirus (Fig. 25a) contains the tk gene (in place of 
the gragr, pol and env viral genes) , for negative selection in 
gancyclovir- containing medium and, in the two LTRs, an I -See I 
recognition site and the PhleoLacZ fusion gene. The PhleoLacZ 
gene can be used for positive selection of transduced cells in 
phleomycine-containing medium. 

We hypothesized that the expression of I -See I endonuclease 
in these cells would induce double-strand breaks (DSB) at the 
I -See I recognition sites that would be repaired by one of the 
following mechanisms (illustrated in Fig. 25) : a) if the I-Sce I 
endonuclease induces a cut in only one of the two LTRs 
(Fig. 1-b 1 and 2) , sequences that are homologous between the two 
LTRs could pair and recombine leading to an intra -chromosomal 
homologous recombination (i.e. by single strand annealing (SSA) 
(12B, 10B) or crossing-over) ; b) If the I-Sce I endonuclease 
induces a cut in each of the two LTRs, the two free ends can 
religate (end joining mechanism (31B) leading to an 
intra-chromosomal recombination (Fig, 25 -b 3) ; or alternatively c) 
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the gap created by the two DSBs can be repaired by a gap repair 

mechanism using sequences either on the homologous chromosome or 

on other chromosomal segments, leading to the loss of the proviral 

sequences (32B) (Fig. 25-c) . 

The phenotype conferred to the NIH3T3 cells by the G-MtkPL 
R. + s 

provirus is Phleo /3-Gal Gls- . In a first series of 
experiments, we searched for recombination by selecting for the 
loss of the tk gene. NIH3T3/G-MtkPL 1 and 2 (two independent 
clones with a different proviral integration site) were 
transfected with the I-Sce I expression vector pCMV (I-Sce 1+) or 
with the control plasmid pCMV( I-Sce-) which does not express the 
I -See I endonuclease. The cells were then propagated in 
Gancyclovir-containing medium to select for the loss of tk 
activity. The resulting Gls clones were also assayed for 
/?-galactosidase activity by histochemical staining (with X-gal) 
(Table 1) . 



Table 1 



Number and nature of Gls resistant clones 



I -See I expression 
/3-Gal activity 



pCMV(I-SceI+) 



+ 



pCMV(I>SceI-) 



+ 



NIH3T3/G-MtkPL 1 
NIH3T3/G-MtkPL 2 



11 
16 



154 
196 



0 
2 



0 
0 



TABLE 1: Effect of I-Sce I expression on recombination frequency. 
1X10 6 cells of NIH3T3/G-MtkPL 1 and 2X10 6 cells of NIH3T3/ 
G-MtkPL 1 were transfected with either pCMV (I -See 1+) or 
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pCMV(I-Sce I-) . Cells were cultivated in medium containing 

R 

gancyclovir. 0-Galactosidase phenotype of the Gls clones was 
determined by X-Gal histochemical staining. 

In the control series transfected with pCMV (I-Scel-) , Gls 
resistant clones were found at a low frequency (2 clones for 

G + 
3x10" treated cells) and the two were j3-Gal . In the 

experimental series transfected with pCMV(I-SceI+) , expression of 

R 

the I -See I gene increased the frequency of Gls clones 100 fold. 
These clones were either j8-Gal" (93%) or |3-Gal + (7%) . Five 
j3-Gal" clones from the NIH3T3 /G-MtkPL 1 and six from the NIH3T3/ 
G-MtkPL 2 were analyzed by Southern blotting using Pst I 
(Fig. 26) . In the parental DNA # Pst I endonuclease cuts twice in 
the tk gene of the provirus (Fig. 26a) . The sizes of the two 
PhleoLacZ containing fragments are determined by the position of 
the Pst I sites in the flanking cellular DNA. In NIH3T3/G-MtkPL 
1, these two PhleoLacZ fragments are 10 kbp long and in NIH3T3/ 
G-MtkPL 2 they are 7 and 9 kbp long. The five Gls 0-Gal 
resistant clones from NIH3T3/G-MtkPL 1 and the six clones from th- 
NIH3T3/G-MtkPL 2 all showed an absence of the tk gene and of the 
two PhleoLacZ sequences (Fig. 26b and c) . 

R + 

In the experimental series the number of Gls j3-Gal clones 

is increased about 10 fold by I -See I expression in comparison to 

the control series. These were not analyzed further. 

R + 

In order to increase the number of Gls 0-Gal clones 
recovered, in a second set of experiments, the cells were grown in 
a medium containing both Gancyclovir and Phleomycin.- Gancyclovir 
selects for cells that have lost tk activity and Phleomycin for 
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cells that maintained the PhleoLacZ gene. We transfected NIH3T3/ 
G-MtkPLs 1 and 2 with pCMV(I-Sce!+) or pCMV (I-Scel-) (Table 2) . 





Table 2 




Number of Phleo and Gls 


resistant clones 




I -See I expression 


DCMV(I-SceI+) 


DCMV(I-Scel-) 


NIH3T3 /G-MtkPL 1 


74 


2 


NIH3T3/G-MtkPL 2 


207 


9 



TABLE 2: Effect of I-Sce I expression on the intra -chromosomal 
recombination frequency. 2x10^ cells of NIH3T3 /G-MtkPL 1 and 
9xl0 6 cells of NIH3T3/G-MtkPL 2 were transfected with either 
pCMV(I-Sce 1+) or pCMV(I-Sce I-) . Cells were cultured in 
Phleomycin and gancyclovir containing medium. 

In the control series, the frequency of recovery of Phleo 

R - 6 

Gls resistant clones was 1x10 . This result reflects cells that 

have spontaneously lost tk activity, while still maintaining the 

PhleoLacZ gene active. In the experimental series, this frequency 

was raised about 20 to 3 0 fold, in agreement with the first set or 

experiments (Table 1) . 

R + R 

The molecular structure of the Phleo jS-Gal Gls clones was 
analyzed by Southern blotting (Fig. 27) , Four clones .from NIH3T3/ 
G-MtkPL I were analyzed, two from the experimental series and two 
from the control. Their DNA was digested with Pst I endonuclease . 
If an intra-chromosomal event had occurred, we expected a single 
Pst I fragment of 13.6 kbp (that is the sum of the three Pst I 
fragments of the parental DNA minus the I- See I fragment, see 
Fig. 27a) . All four Phleo R Gls R resistant clones exhibited this 
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13.6 kbp Pst I fragment, suggesting a faithful intra-molecular 
recombination (Fig, 27b) . 

DNA from eight clones from NIH3T3/G-MtkPL 2 cells were 
analyzed by Southern blotting using Bel I digestion (six from the 
experimental series and two from the control) . Bel I digestion of 
the parental DNA results in one 7.2 kbp fragment containing the 
proviral sequences and in two flanking fragments of 6 kbp and 9.2 
kbp. An intra -chromosomal recombination should result in the loss 
of the 7 . 2 kbp fragment leaving the two other bands of 6 kbp and 
9.2- kbp unchanged (Fig. 27a). The eight clones (2.7 to 2.16) 
showed the disappearance of the tk containing 7.2 kbp fragment 
indicative of an intra- chromosomal recombination between the two 
LTRs (Fig. 27c) . 

DISCUSSION 

The results presented here demonstrate that the yeast I -See I 
endonuclease induces chromosomal recombination in mammalian cells. 
This strongly suggests that I- See I is able to cut in vivo a 
chromosome at a predetermined target. 

Double- strand breaks in genomic sequences of various species 
stimulate recombination (21B, 19B) . In the diploid yeast, a 
chromosomal DSB can lead to the use of the homo- allelic locus as a 
repair matrix. This results in a gene conversion event, the locus 
then becoming homozygous (3 OB) . The chromosomal DSBs can also be 
repaired by using homologous sequences of an ectopic locus as 
matrix (32B) . This result is observed at a significant level as a 
consequence of a DSB gap repair mechanism. If the DSB occurs 
between two direct -repeated chromosomal sequences, the mechanism 
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of recombination uses the single strand annealing (SSA) pathway 
(11B, 10B) ♦ The SSA pathway involves three steps: 1) an 
exonucleolysis initiated at the point of the break leaving 3' 
protruding single-strand DNAs ; 2) a pairing of the two single 
strand DNAs by their homologous sequences, 3) a repair of the DNA 
by repairs complexes and mutator genes which resolve the 
non- homologous sequences (33B) . A special case concerns the 
haploid yeast for which it has been showed that DSBs induced by HO 
or I -See I endonucleases in a chromosome leads to the repair of 
the break by end joining (34B) . This occurs, but at a low 
efficiency (30B, 35B) . 

Our results show that the presence of two I-Sce I sites in a 
proviral target and the expression of the I -See I endonuclease 
lead to an increase in the deletion of a thymidine kinase gene at 
a frequency at least 100 fold greater than that occurring 
spontaneously. Two types of tk deleted clones arise from I -See I 
mediated recombination: clones that have kept (7%) and clones that 
have lost (93%) the PhleoLacZ sequences. 

The generation of tk~ PhleoLacZ* cells is probably the 
consequence of intra -chromosomal recombination. Studies have 
shown that in a recombinant provirus with an I -See I recognition 
site in the LTRs, the I -See I endonuclease leads in 20% of the 
cases to the cleavage of only one proviral I -See I site and in 80% 
to the cleavage of the two proviral I-Sce I sites. If only one of 
the two I-Sce I sites is cut by the endonuclease, an 
intra-chromosomal recombination can occur by the SSA pathway. If 
the two I-See I sites are cut, the tk~ FhleoLacC* cells can be 
generated by end joining, allowing intra-chromosomal recombination 
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(see Figure 1) . Although, in the diploid yeast, this pathway is 
not favorable (the break is repaired using homologous chromosomal 
sequences) (2B) , it remains possible that this pathway is used in 
mammalian cells . 

The generation of tk /PhleoLacZ cells is probably a 
consequence of either a homo-allelic and/or an ectopic gene 
conversion event (36B) . Isolation and detailed molecular analysis 
of the proviral integration sites will provide information on the 
relative frequency of each of these events for the resolution of 
chromosomal DSBs by the cell. This quantitative information is 
important as, in mammalian cells, the high redundancy of genomic 
sequences raises the possibility of a repair of DSBs by ectopic 
homologous sequences. Ectopic recombination for repair of DSBs 
may be involved in genome shaping and diversity in evolution [29] . 

The ability to digest specifically a chromosome at a 
predetermined genomic location has several potential applications 
for genome manipulation. 

The protocol of gene replacement described herein can be 
varied as follows: 

Variety of donor vectors 

Size and sequence of flanking regions of I -See I site in 
the donor plasmid (done with 300 pb left and 2.5 kb right) : 
Different constructions exist with various size of flanking 
regions up to a total of 11 kb left and right from I-Sce I site. 
The sequences depend from the construction (LTR, gene) * Any 
sequence comprising between 3 00 bp to 11 kb can be used. 

- Inserts (neo, phleo, phleo-LacZ and Pytk-neo have been 
constructed) . Antibiotic resistance: neomycin, phleomycin; 
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reporter gene (LacZ) ; HSV1 thymidine kinase gene: sensitivity to 
gancyclovir. It is impossible to insert any kind of gene sequence 
up to 10 kb or to replace it. The gene can be expressed under an 
inducible or constitutive promoter of the retrovirus, or by gene 
trap and homologous recombination (i.e. Insulin, Hbs, ILs and 
various proteins) . 

Various methods can be used to express the enzyme I-Sce I: 
transient transfection (plasmid) or direct injection of protein 
(in embryo nucleus); stable transfection (various promoters like: 
CMV, RSV and MoMuLV) ; defective recombinant retroviruses 
(integration of ORF in chromosome under MoMuLV promoter) ; and 
episomes . 

Variation of host range to integrate I-Sce I site: 
Recombinant retroviruses carrying I-Sce I site (i.e. pG-MPL, 

pG-MtkPL, pG-Mtk^PAPL) may be produced in various packaging cell 

lines (amphotropic or xenotropic) . 

Construction of stable cell lines expressing I-Sce I 
and cell protection against retroviral infection 

Stable cell line expressing I-Sce I are protected against 

infection by a retroviral vector containing I-Sce I site (i.e. NF: 

3T3 cell line producing I-Sce I endonuclease under the control of 

the CMV promoter is resistant to infection by a pG-MPL or pGMtkPL 

or I-Sce I under MoMuLV promoter in ^ 2 cells) . 

Construction of cell lines and transgenic 
animals containing the I-Sce I site 

Insertion of the I-Sce I site is carried out by a classical 
gene replacement at the desired locus and at the appropriate 
position. It is then possible to screen the expression of 
different genes at the same location in the cell (insertion of the 
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donor gene at the artificially inserted I -See I site) or in a 
transgenic animal. The effect of multiple drugs, ligands, medical 
protein, etc., can be tested in a tissue specific manner. The 
gene will consistently be inserted at the same location in the 
chromosome . 

For "Unprepared" mouse cells, and all eucaryotic cells, a one 
step gene replacement/integration procedure is carried out as 
follows : 

- Vectors (various donor plasmids) with I -See I site: one 
site within the gene (or flanking) or two sites flanking the donor 
gene . 

- Method to express the enzyme 

Transient expression: ORF on the same plasmid or another 
(cotransfection) . 

Specific details regarding the methods used are described 
above. The following additional details allow the construction of 
the following: 

a cell line able to produce high titer of a variety of 
infective retroviral particles; 

plasmid containing a defective retrovirus with I-Sce I sites, 
reporter- selector gene, active LTRs and other essential retroviral 
sequences; a plasmid containing sequences homologous to flanking 
regions of I -See I sites in above engineered retrovirus and 
containing a multiple cloning site; and a vector allowing 
expression of I- See I endonuclease and adapted to the specific 
applications. 

Mouse fibroblast ^2 cell line was used to produce ectopic 
defective recombinant retroviral vectors containing I-See I sites. 
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Cell lines producing plasmids as pG-MPL, pG-MtkPL, PG-Mtk^ pA PL 
are also available. In addition, any cells, like mouse 
amphotropic cells lines (such as PA12) or xenotropic cells lines, 
that produce high titer infectious particles can be used for the 
production of recombinant retroviruses carrying I- See I site 
(i.e., pG-MPL, pG-MtkPL, pG-Mtk^ pA PL) in various packaging cell 
lines (amphotropic, ectropic or xenotropic) . 

A variety of plasmids containing I -See I can be used in 
retroviral construction, including pG-MPL, pG-MtkPL, and 
pG-Mtk^p A PL. Others kind of plasmid vector can be constructed 
containing various promoters, genes, polyA site, and I-Sce I site. 
A variety of plasmid containing sequences homologs to flanking 
regions of I -See I can be constructed. The size and sequence of 
flanking regions of I- See I site in the donor plasmid are prepared 
such that 300 kb are to the left and 2.5 kb are to the right) . 
Other constructions can be used with various sizes of flanking 
regions of up to about 11 kb to the left and right of the I- See I 
recognition site. 

Inserts containing neomycin, phleomycin and phleo-LacZ have 
been constructed. Other sequences can be inserted such as drug 
resistance or reporter genes, including LacZ, HSV1 or .thymidine 
kinase gerie (sensibility to gancyclovir) , insulin, CFTR, IL2 and 
various proteins. It is normally possible to insert any kind of 
sequence up to 12 kb, wherein the size depends on the virus 
capacity of encapsidation) . The gene can be expressed under 
inducible or constitutive promoter of the retrovirus, or by gene 
trap after homologous recombination. 
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A variety of plasmids containing I- See I producing the 
endonuclease can be constructed. Expression vectors such as 
pCMVI-Scel (+) or similar constructs containing the ORF, can be 
introduced in cells by transient transfection, electroporation or 
lipofection. The protein can also be introduced directly into the 
cell by injection of liposomes. 

Variety of cells lines with integrated I-Sce I sites can be 
produced. Preferably, insertion of the retrovirus (proviral 
integration) induce duplication of LTR containing the I-Sce I 
site. The cell will be hemizygote for the site. Appropriate cell 
lines include: 

1. Mouse Fibroblastic cell line, NIH 3T3 with 1 to 14 
proviral integration of G-MPL. Multiple (more than 30) clones 
were recovered. The presence of and the multiplicity of the 
different genomic integrations (uncharacterized) were verified by 
molecular analysis. 

2. Mouse Fibroblastic cell line, NIH 3T3 with 1 copy of 
G-MtkPL integrated in the genome. 4 clones were covered. 

3. Mouse Embryonal Carcinoma cell line, PCC7-S with 1 to 4 
copies of G-MPL proviral integration in the genome. 14 clones 
were covered. 

4. Mouse Embryonal Carcinoma cell line, PCC4 with 1 copy of 
G-MtkPL integrated in the genome. 

5. Mouse Embryonic Stem cell line D3 with 1 to 4 copies of 
G-MPL at a variety of genomic localisation (uncharacterized) . 4 
clones were recovered. 

Construction of other cell lines and transgenic animals 
containing the I-Sce I site can be done by insertion of the 
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I -See I site by a classical gene replacement at the desired locus 
and at the appropriate position. Any kind of animal or plant cell 
lines could a priori be used to integrate I -See I sites at a 
variety of genomic localisation with cell lines adapted. The 
invention can be used as follows: 

1. Site specific gene insertion 

The methods allow the production of an unlimited number of 
cell lines in which various genes or mutants of a given gene can 
be inserted at the predetermined location defined by the previous 
integration of the I-Sce I site. Such cell lines are thus useful 
for screening procedures, for phenotypes, ligands, drugs and for 
reproducible expression at a very high level of recombinant 
retroviral vectors if the cell line is a transcomplementing cell 
line for retrovirus production. 

Above mouse cells or equivalents from other vertebrates, 
including man, can be used. Any plant cells that can be 
maintained in culture can also be used independently of whether 
they have ability to regenerate or not, or whether or not they 
have given^rise to fertile plants. The methods can also be used 
with transgenic animals. 

2. Site specific gene expression 

Similar cell lines can also be used to produce proteins, 
metabolites or other compounds of biological or biotechnological 
interest using a transgene, a variety of promoters, regulators 
and/or structural genes. The gene will be always inserted at the 
same localisation in the chromosome. In transgenic animals, it 
makes possible to test the effect of multiple drugs, ligands, or 
medical proteins in a tissue-specific manner. 
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3. Insertion of the I-Sce I recognition site in the CFTR 
locus using homologous sequences flanking the CFTR gene in the 
genomic DNA. The I -See I site can be inserted by spontaneous gene 
replacement by double-crossing over (Le Mouellic et al. PNAS, 
1990, Vol. 87, 4712-4716). 

4 . Biomedical applications 

A. In gene therapy, cells from a patient can be 
infected with a I -See I containing retrovirus, screened for 
integration of the defective retrovirus and then co-transformed 
with the I -See I producing vector and the donor sequence. 

Examples of appropriate cells include hematopoeitic tissue, 
hepatocytes, skin cells, endothelial cells of blood vessels or any 
stem cells. 

I- See I containing retroviruses include pG-MPL, pG-MtkPL or 
any kind of retroviral vector containing at least one I- See I 
site . 

I-Sce I producing vectors include pCMVI-See I (+) or any 
plasmid allowing transient expression of I-Sce I endonuclease. 

Donor ^sequences include (a) Genomic sequences containing the 
complete IL2 gene; (b) Genomic sequences containing the pre- 
Prolnsulin gene; (c) A large fragment of vertebrate, including 
human, genomic sequence containing cis-acting elements for gene 
expression. Modified cells are then' reintroduced into the patient 
according to established protocols for gene therapy. 

B. Insertion of a promoter (i.e., CMV) with the I-Sce I 
site, in a stem cell (i.e., lymphoid) . A gap repair molecule 
containing a linker (multicloning site) can be inserted between 
the CMV promoter and the downstream sequence. The insertion of a 
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gene (i.e., IL-2 gene), present in the donor plasmids, can be done 
efficiently by expression of the I-Sce I meganuclease (i.e., co- 
transfection with a I -See I meganuclease expression vector) . The 
direct insertion of IL-2 gene under the CMV promoter lead to the 
direct selection of a stem cell over-expressing IL-2. 

For constructing transgenic cell lines, a retroviral 
infection is used in presently available systems. Other method to 
introduce I- See I sites within genomes can be used, including 
micro- injection of DNA, Ca-Phosphate induced transf ection, 
ele'etroporation, lipofection, protoplast or cell fusion, and 
bacterial -cell con j ugat ion . 

Loss of heterozygosity is demonstrated as follows: The 
I-Sce I site is introduced in a locus (with or without foreign 
sequences), creating a heterozygous insertion in the cell. In the 
absence of repair DNA, the induced double-strand break will be 
extend by non-specific exonucleases, and the gap repaired by the 
intact sequence of the sister chromatide, thus the cell become 
homozygotic at this locus. 

Specific examples of gene therapy include immunomodulation 
(i.e. changing range or expression of IL genes); replacement of 
defective genes; and excretion of proteins (i.e. expression of 
various secretory protein in organelles) . 

It is possible to activate a specific gene in vivo by I-Sce I 
induced recombination. The I -See I cleavage site is introduced 
between a duplication of a gene in tandem repeats, creating a loss 
of function. Expression of the endonuclease I -See I induces the 
cleavage between the two copies. The reparation by recombination 
is stimulated and results in a functional gene. 
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Site-directed genetic macro -rearrangements of chromosomes in 
cell lines or in organisms . 

Specific translocation of chromosomes or deletion can be 
induced by I-Sce I cleavage. Locus insertion can be obtained by 
integration of one at a specific location in the chromosome by 
"classical gene replacement." The cleavage of recognition 
sequence by I -See I endonuclease can be repaired by non- lethal 
translocations or by deletion followed by end- joining, A deletion 
of a fragment of chromosome could also be obtained by insertion of 
two or more I -See I sites in flanking regions of a locus (see 
figure 32) , The cleavage can be repaired by recombination and 
results in deletion of the complete region between the two sites 
(see figure 32) . 
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WE CLAIM : 

1. A method of inducing at least one site-directed double- 
strand break in DNA of a cell, said method comprising 

(a) providing cells containing double -stranded DNA, 
wherein said DNA comprises at least one I -See I restriction site; 

(b) transfecting said cells with at least a plasmid 
comprising DNA encoding the I -See I meganuclease; and 

(c) selecting cells in which at least one double-strand 
break has been induced. 

2. The method of claim 1, wherein said cell is selected 
from the group consisting of a mammalian cell, a yeast cell, and a 
plant cell. 

3. The method of claim 2, wherein said cell is an NIH3T3 
cell containing the G-MtkPL virus, 

4. The method of claim 1, wherein said plasmid is pCMV(I- 
Sce 1+) . 

5. A method of inducing homologous recombination between 
chromosomal DNA of a cell and exogenous DNA added to said cell, 
said method comprising 

(a) providing cells containing chromosomal DNA, < wherein said 
DNA comprises at least one I -See I restriction site; 

(b) transfecting said cells with a plasmid comprising 
exogenous DNA, and with a plasmid comprising DNA encoding the I- 
Sce I meganuclease; and 

(c) selecting cells in which said exogenous DNA is inserted 
into said chromosomal DNA. 
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6. The method of claim 5, wherein said cell is selected 
from the group consisting of a mammalian cell, a yeast cell, and a 
plant cell, 

7. The method of claim 6, said cell is an NIH3T3 cell 
containing the G-MtkPL virus. 

8. The method of claim 5, wherein said plasmid is pCMV(I- 
Sce 1+) . 

9. A method of inducing homologous recombination between 
chromosomal DNA of a cell and exogenous DNA added to said cell, 
sard method comprising 

(a) providing cells comprising chromosomal DNA; 

(b) inserting at least one I-Sce I restriction site in said 
chromosomal DNA; 

(c) transfecting said cells with a first plasmid comprising 
exogenous DNA, and with a second plasmid comprising DNA encoding 
the I-Sce I meganuclease; and 

(d) selecting cells in which said exogenous DNA is inserted 
into said chromosomal DNA. 

10. The method of claim 9, wherein said cell is selected 
from the group consisting of a mammalian cell, a yeast cell, and a 
plant cell. 

11. The method of claim 9, wherein said first plasmid is 
pCMV(I-Sce 1+) . 

12. The method of claim 9, wherein said second plasmid is 
pVRneo. 

13. A method of inducing at least one site-directed break in 
DNA of a cell and inserting DNA encoding a polypeptide, said 
method comprising, 
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(a) providing cells containing double -stranded DNA, wherein 
said cells are capable of being transformed by a DNA comprising a 
I- See I restriction site and DNA encoding said polypeptide; 

(b) adding See I enzyme or transforming said cell with DNA 
encoding See I enzyme; 

(c) transfecting said cells with said DNA encoding said 
polypeptide or with a vector containing said DNA; and 

(d) selecting cells transfected with said DNA or said 
vector, wherein said cells express said polypeptide. 

14 . A recombinant eukaryotic cell transformed by the method 
of any one of claims 1 and 13 . 

15. A transgenic animal comprising a cell transformed by the 
method of any one of claims 1 and 13 . 

16. A method of expressing a polypeptide in a transgenic 
animal, said method comprising transforming embryonic stem cells 
with a DNA comprising a I- See I restriction site and DNA encoding 
said polypeptide, and detecting expression of said polypeptide in 
a transgenic animal resulting from said transformed embryonic stem 
cells. 

17. A recombinant stem cell expressing a polypeptide, 
wherein said stem cell is transformed by a DNA comprising a I -See 
I restriction site and DNA encoding said polypeptide by 

(a) adding See I enzyme to said cell or transforming said 
cell with a vector containing the gene coding for See I enzyme; 

(b) transfecting said cells with said DNA encoding said 
polypeptide; and 

(c) selecting cells transfected with said DNA, wherein said 
cells express said polypeptide. 
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18. A recombinant eukaryotic cell as claimed in any one of 
claims 4 and 7 wherein said polypeptide is a foreign antigen to 
the cell . 

19. The recombinant eukaryotic cell as claimed in claim 14 
wherein cell is a mammalian cell line. 

20. The recombinant eukaryotic cell as claimed in claim 14 
wherein cell is a yeast. 

21. A method of inducing at least one site-directed break in 
DNA of cells and inserting DNA encoding a polypeptide, wherein 
sai'd cells express at least one protein product, said method 
comprising, 

(a) providing cells containing double- stranded DNA, wherein 
said cells are capable of being transformed by a DNA comprising a 
I -See I restriction site and DNA encoding said polypeptide; 

(b) adding See I enzyme to said cells or transforming said 
cells with DNA encoding See I enzyme; 

(c) transfecting said cells with said DNA encoding said 
polypeptide or with a vector containing said DNA; and 

(d) selecting cells "transfected with said DNA or said 
vector, wherein said cells express said polypeptide and do not 
express said protein product. 

22. A recombinant cell transformed by the method of claim 

21. 
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ABSTRACT 



An isolated DNA encoding the enzyme I- Seel is provided . The 
DNA sequence can be incorporated in cloning and expression 
vectors, transformed cell lines and transgenic animals. The 
vectors are useful in gene mapping and site-directed insertion of 
genes . 
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trie Ml tochondr ial I - See I Gene. 
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The synthetic I-Sce I gene 
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These amino acids are absolutely necessary to produce 
catalytic activity. Other substitutions are possible, 
such as deletions of the 10 first amino acids. 
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Pairing and recombination 
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3. Both l-Sc# 1 sites are cut. 
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c . Inter-chromosomal recombination event 

Both l-Sce 1 sites are cut. Gap repair using 
Intact chromosome sequences 
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/ Figure 28 

LOSS OF HETEROZYGOSITY 

Integration of artificial site or presence of specific site 

l-Sce I 
I l 

Expression of l-Sce I and specific cleavage 



Repair of the DSB with the other chromatid 
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Figure 29 



CONDITIONNAL ACTIVATION (Tandem repeat) 



Integration of artificial site between tandem repeats 

l-Sce I 



gene X inactive 
Expression of l-Sce I and specific cleavage 



Repair of the DSB by single strand annealing ^ 

gene X active 



ONE STEP REARR/ 'GEMENT 



Figure 30 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION : 

(i) APPLICANT: Choulika, Andre 

Perrin, Arnaud 
Dujon, Bernard 
Nicolas, Jean-Francois 

(ii) TITLE OF INVENTION: Nucleotide Sequence Encoding the Enzyme 
I-SCEI and the Uses Thereof 

(iii) NUMBER OF SEQUENCES: 52 
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(A) ADDRESSEE: Finnegan, Henderson, Farabow, Garrett & 

Dunner 
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(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 
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(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 09/244,130 

(B) FILING DATE: 04-FEB-1999 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 09/119,024 

(B) FILING DATE: 20-JUL-1998 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/336,241 

(B) FILING DATE: 07-NOV-1994 

(vii) PRIOR APPLICATION DATA: 
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(vii) PRIOR APPLICATION DATA: 
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(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Meyers, Kenneth J. 

(B) REGISTRATION NUMBER: 25,146 

(C) REFERENCE /DOCKET NUMBER: 3495-0111-11 



(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 202-408-4000 

(B) TELEFAX: 202-408-4400 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 714 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



ATGCATATGA 


AAAACATCAA 


AAAAAACCAG 


GTAATGAACC 


TCGGTCCGAA 


CTCTAAACTG 


60 


CTGAAAGAAT 


ACAAATCCCA 


GCTGATCGAA 


CTGAACATCG 


AACAGTTCGA 


AGCAGGTATC 


120 


GGTCTGATCC 


TGGGTGATGC 


TTACATCCGT 


TCTCGTGATG 


AAGGTAAAAC 


CTACTGTATG 


180 


CAGTTCGAGT 


GGAAAAACAA 


AG C AT AC AT G 


GACCACGTAT 


GTCTGCTGTA 


CGATCAGTGG 


240 


GTACTGTCCC 


CGCCGCACAA 


AAAAGAACGT 


GTTAACCACC 


TGGGTAACCT 


GGTAATCACC 


300 


TGGGGCGCCC 


AGACTTTCAA 


ACACCAAGCT 


TTCAACAAAC 


TGGCTAACCT 


GTTCATCGTT 


360 


AACAACAAAA 


AAACCATCCC 


GAACAACCTG 


GTTGAAAACT 


ACCTGACCCC 


GATGTCTCTG 


420 


GCATACTGGT 


TCATGGATGA 


TGGTGGTAAA 


TGGGATTACA 


ACAAAAACTC 


TACCAACAAA 


480 


TCGATCGTAC 


TGAACACCCA 


GTCTTTCACT 


TTCGAAGAAG 


TAGAATACCT 


GGTTAAGGGT 


540 


CTGCGTAACA 


AATTCCAACT 


GAACTGTTAC 


GTAAAAATCA 


ACAAAAACAA 


AC C GAT CATC 


600 


TACATCGATT 


CTATGTCTTA 


CCTGATCTTC 


TACAACCTGA 


TCAAACCGTA 


CCTGATCCCG 


660 


CAGATGATGT 


ACAAACTGCC 


GAACACTATC 


TCCTCCGAAA 


CTTTCCTGAA 


ATAA 


714 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 237 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met His Met Lys Asn lie Lys Lys Asn Gin Val Met Asn Leu Gly Pro 
15 10 15 

Asn Ser Lys Leu Leu Lys Glu Tyr Lys Ser Gin Leu lie Glu Leu Asn 



20 



25 



30 



lie Glu Gin Phe Glu Ala Gly lie Gly Leu He Leu Gly Asp Ala Tyr 
35 40 45 

He Arg Ser Arg Asp Glu Gly Lys Thr Tyr Cys Met Gin Phe Glu Trp 
50 55 60 

Lys Asn Lys Ala Tyr Met Asp His Val Cys Leu Leu Tyr Asp Gin Trp 
65 70 75 80 

Val Leu Ser Pro Pro His Lys Lys Glu Arg Val Asn His Leu Gly Asn 

85 90 95 

Leu Val He Thr Trp Gly Ala Gin Thr Phe Lys His Gin Ala Phe Asn 
100 105 110 

Lys Leu Ala Asn Leu Phe He Val Asn Asn Lys Lys Thr He Pro Asn 
115 120 125 

Asn Leu Val Glu Asn Tyr Leu Thr Pro Met Ser Leu Ala Tyr Trp Phe 
130 135 140 

Met Asp Asp Gly Gly Lys Trp Asp Tyr Asn Lys Asn Ser Thr Asn Lys 
145 150 155 160 

Ser He Val Leu Asn Thr Gin Ser Phe Thr Phe Glu Glu Val Glu Tyr 

165 170 175 

Leu Val Lys Gly Leu Arg Asn Lys Phe Gin Leu Asn Cys Tyr Val Lys 
180 185 190 

He Asn Lys Asn Lys Pro He He Tyr He Asp Ser Met Ser Tyr Leu 
195 200 205 

He Phe Tyr Asn Leu He Lys Pro Tyr Leu He Pro Gin Met Met Tyr 
210 215 220 

Lys Leu Pro Asn Thr He Ser Ser Glu Thr Phe Leu Lys 
225 230 235 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 722 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

AAAAATAAAA TCATATGAAA AATATTAAAA AAAATCAAGT AATCAATCTC GGTCCTATTT 60 

CTAAATTATT AAAAGAATAT AAATCACAAT TAATTGAATT AAATATTGAA CAATTTGAAG 120 

CAGGTATTGG TTTAATTTTA GGAGATGCTT ATATTCGTAG TCGTGATGAA GGTAAAACTT 18 0 



ATTGTATGCA ATTTGAGTGG AAAAATAAGG CATACATGGA TCATGTATGT TTATTATATG 240 

ATCAATGGGT ATTATCACCT CCTCATAAAA AAGAAAGAGT TAATCATTTA GGTAATTTAG 300 

TAATTACCTG GGGAGCTCAA ACTTTTAAAC ATCAAGCTTT TAATAAATTA GCTAACTTAT 360 

TTATTGTAAA TAATAAAAAA CTTATTCCTA ATAATTTAGT TGAAAATTAT TTAACACCTA 420 

TGAGTCTGGC ATATTGGTTT AT G GAT GAT G GAGGTAAATG GGATTATAAT AAAAATTCTC 48 0 

TTAATAAAAG TATTGTATTA AATACACAAA GTTTTACTTT TGAAGAAGTA GAATATTTAC 54 0 

TTAAAGGTTT AAGAAATAAA TTTCAATTAA ATTGTTATGT TAAAATTAAT AAAAATAAAC 600 

CAATTATTTA TATTGATTCT ATGAGTTATC TGATTTTTTA TAATTTAATT AAACCTTATT 660 

TAATTCCTCA AATGATGTAT AAACTGCCTA ATACTATTTC ATCCGAAACT TTTTTAAAAT 720 

AA 722 
(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 235 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Lys Asn lie Lys Lys Asn Gin Val Met Asn Leu Gly Pro Asn Ser 
15 10 15 

Lys Leu Leu Lys Glu Tyr Lys Ser Gin Leu lie Glu Leu Asn lie Glu 
20 25 30 

Gin Phe Glu Ala Gly He Gly Leu He Leu Gly Asp Ala Tyr He Arg 
35 40 45 

Ser Arg Asp Glu Gly Lys Thr Tyr Cys Met Gin Phe Glu Trp Lys Asn 
50 55 60 

Lys Ala Tyr Met Asp His Val Cys Leu Leu Tyr Asp Gin Trp Val Leu 
65 70 75 80 

Ser Pro Pro His Lys Lys Glu Arg Val Asn His Leu Gly Asn Leu Val 

85 90 95 

He Thr Trp Gly Ala Gin Thr Phe Lys His Gin Ala Phe Asn Lys Leu 
100 105 110 

Ala Asn Leu Phe He Val Asn Asn Lys Lys Leu He Pro Asn Asn Leu 
115 120 125 

Val Glu Asn Tyr Leu Thr Pro Met Ser Leu Ala Tyr Trp Phe Met Asp 
130 135 140 



Asp Gly Gly Lys Trp Asp Tyr Asn Lys Asn Ser Leu Asn Lys Ser lie 
145 150 155 160 



Val Leu Asn Thr Gin Ser Phe Thr Phe Glu Glu Val Cys Tyr Leu Val 

165 170 175 

Lys Gly Leu Arg Asn Lys Phe Gin Leu Asn Cys Tyr Val Lys lie Asn 
180 185 190 

Lys Asn Lys Pro lie lie Tyr lie Asp Ser Met Ser Tyr Leu lie Phe 
195 200 205 

Tyr Asn lie lie Lys Pro Tyr Leu lie Pro Gin Met Met Tyr Lys Leu 
210 215 220 

Pro Asn Thr lie Ser Ser Glu Thr Phe Leu Lys 
225 230 235 

(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 754 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



CCGGATCCAT 


GCATATGAAA 


AACATCAAAA 


AAAACCAGGT 


AATGAACCTG 


GGTCCGAACT 


60 


CTAAACTGCT 


GAAAGAATAC 


AAATCCCAGC 


TGATCGAACT 


GAACATCGAA 


CAGTTCGAAG 


120 


CAGGTATCGG 


TCTGATCCTG 


GGTGATGCTT 


ACATCCGTTC 


TCGTGATGAA 


GGTAAAACCT 


180 


ACTGTATGCA 


GTTCGAGTGG 


AAAAACAAAG 


CATACATGGA 


CCACGTATGT 


CTGCTGTACG 


240 


ATCAGTGGGT 


ACTGTCCCCG 


CCGCACAAAA 


AACAACGTGT 


TAACCACCTG 


GGTAACCTGG 


300 


TAATCACCTG 


GGGCGCCCAG 


ACTTTCAAAC 


ACCAAGCTTT 


CAACAAACTG 


GCTAACCTGT 


360 


TCATCGTTAA 


CAACAAAAAA 


ACCATCCCGA 


ACAACCTGGT 


TGAAAACTAC 


CTGACCCCGA 


420 


TGTCTCTGGC 


ATACTGGTTC 


ATGGATGATG 


GTGGTAAATG 


GGATTACAAC 


AAAAACTCTA 


480 


CCAACAAATC 


GATCGTACTG 


AACACCCAGT 


CTTTCACTTT 


CGAAGAAGTA 


GAATACCTGG 


540 


TTAAGGGTCT 


GCGTAACAAA 


TTCCAACTGA 


ACTGTTACGT 


AAAAATCAAC 


AAAAACAAAC 


600 


CGATCATCTA 


CATCGATTCT 


ATGTCTTACC 


TGATCTTCTA 


CAACCTGATC 


AAACCGTACC 


660 


TGATCCCGCA 


GAT GAT GT AC 


AAACTGCCGA 


ACACTATCTC 


CTCCGAAACT 


TTCCTGAAAT 


720 


AATAAGTCGA 


CTGCAGGATC 


CGGTAAGTAA 


GTAA 






754 



(2) INFORMATION FOR SEQ ID NO : 6 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6 
AATGCTTTCC A 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7 
GTTACGCTAG GGATAACAGG GTAAT 
(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8 
CAATGCGATC CCTATTGTCC CATTA 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1738 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 



GCGGACAGGT ATCCGGTAAG CGGCAGGGTC GGAACAGGAG AGCGCACGAG GGAGCTTCCA 60 

GGGGGAAACG CCTGGTATCT TTATAGTCCT GTCGGGTTTC GCCACCTCTG ACTTGAGCGT 120 

CGATTTTTGT GATGCTCGTC AGGGGGGCGG AGCCTATGGA AAAACGCCAG CAACGCGGCC 180 

TTTTTACGGT TCCTGGCCTT TTGCTGGCCT TTTGCTCACA TGTTCTTTCC TGCGTTATCC 240 

CCTGATTCTG TGGATAACCG TATTACCGCC TTTGAGTGAG CTGATACCGC TCGCCGCAGC 3 00 

CGAACGACCG AGCGCAGCGA GTCAGTGAGC GAGGAAGCGG AAGAGCGCCC AATACGCAAA 360 

CCGCCTCTCC CCGCGCGTTG GCCGATTCAT TAATGCAGCT GGCACGACAG GTTTCCCGAC 420 

TGGAAAGCGG GCAGTGAGCG CAACGCAATT AATGTGAGTT AGCTCACTCA TTAGGCACCC 4 80 

CAGGCTTTAC ACTTTATGCT TCCGGCTCGT ATGTTGTGTG GAATTGTGAG CGGATAACAA 540 

TTTCACACAG GAAACAGCTA TGACCATGAT TACGAATTCT CATGTTTGAC AGCTTATCAT 600 

CGATAAGCTT TAATGCGGTA GTTTATCACA GTTAAATTGC TAACGCAGTC AGGCACCGTG 660 

TATGAAATCT AACAATGCGC TCATCGTCAT CCTCGGCACC GTCACCCTGG ATGCTGTAGG 7 20 

CATAGGCTTG GTTATGCCGG TACTGCCGGG CCTCTTGCGG GATATCCGCC TGATGCGTGA 7 80 

ACGTGACGGA CGTAACCACC GCGACATGTG TGTGCTGTTC CGCTGGGCAT GCCAGGACAA 840 

CTTCTGGTCC GGTAACGTGC TGAGCCCGGC CAAGCTTACT CCCCATCCCC CTGTTGACAA 900 

TTAATCATCG GCTCGTATAA TGTGTGGAAT TGTGAGCGGA TAACAATTTC ACACAGGAAA 960 

CAGGATCCAT GCATATGAAA AACATCAAAA AAAACCAGGT AATGAACCTG GGTCCGAACT 1020 

CTAAACTGCT GAAAGAATAC AAATCCCAGC TGATCGAACT GAACATCGAA CAGTTCGAAG 1080 

CAGGTATCGG TCTGATCCTG GGTGATGCTT ACATCCGTTC TCGTGATGAA GGTAAAACCT 114 0 

ACTGTATGCA GTTCGAGTGG AAAAACAAAG CATACATGGA CCACGTATGT CTGCTGTACG 1200 

ATCAGTGGGT ACTGTCCCCG CCGCACAAAA AAGAACGTGT TAACCACCTG GGTAACCTGG 1260 

TAATCACCTG GGGCGCCCAG ACTTTCAAAC ACCAAGCTTT CAACAAACTG GCTAACCTGT 1320 

TCATCGTTAA CAACAAAAAA ACCATCCCGA ACAACCTGGT TGAAAACTAC CTGACCCCGA 1380 

TGTCTCTGGC ATACTGGTTC AT GG AT GAT G GTGGTAAATG GGATTACAAC AAAAACTCTA 14 4 0 

CCAACAAATC GATCGTACTG AACACCCAGT CTTTCACTTT CGAAGAAGTA GAATACCTGG 1500 

TTAAGGGTCT GCGTAACAAA TTCCAACTGA ACTGTTACGT AAAAATCAAC AAAAACAAAC 15 60 

CGATCATCTA CATCGATTCT ATGTCTTACC TGATCTTCTA CAACCTGATC AAACCGTACC 1620 

TCATCCCCCA GAT GAT GT AC AAACTGCCGA ACACTATCTC CTCCGAAACT TTCCTGAAAT 168 0 

AATAAGTCGA CCTGCAGCCC AAGCTTGGCA CTGGCCGTCG TTTTACAACG TCGTGACT 17 38 
(2) INFORMATION FOR SEQ ID NO: 10: 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 37 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Leu Val Arg Gly Ala Glu Pro Met Glu Lys Arg Gin Gin Arg Gly 
15 10 15 

Leu Phe Thr Val Pro Gly Leu Leu Leu Ala Phe Cys Ser His Val Leu 
20 25 30 

Ser Cys Val lie Pro 
35 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Gin Leu Ala Arg Gin Val Ser Arg Leu Glu Ser Gly Gin 
15 10 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Leu Pro Ala Arg Met Leu Cys Gly lie Val Ser Gly 
15 10 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Thr Met lie Thr Asn Ser His Val 
1 5 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Lys Ser Asn Asn Ala Leu lie Val He Leu Gly Thr Val Thr Leu 
15 10 15 

Asp Ala Val Gly He Gly Leu Val Met Pro Val Leu Pro Gly Leu Leu 
20 25 30 

Arg Asp He Arg Leu Met Arg Glu Arg Asp Gly Arg Asn His Arg Asp 
35 40 45 

Met Cys Val Leu Phe Arg Trp Ala Cys Gin Asp Asn Phe Trp Ser Gly 
50 55 60 

Asn Val Leu Ser Pro Ala Lys Leu Thr Pro His Pro Pro Val Asp Asn 
65 70 75 80 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Cys Gly He Val Ser Gly 
1 5 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 237 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met His Met Lys Asn lie Lys Lys Asn Gin Val Met Asn Leu Gly Pro 
15 10 15 

Asn Ser Lys Leu Leu Lys Glu Tyr Lys Ser Gin Leu lie Glu Leu Asn 
20 25 30 

lie Glu Gin Phe Glu Ala Gly lie Gly Leu lie Leu Gly Asp Ala Tyr 
35 40 45 

lie Arg Ser Arg Asp Glu Gly Lys Thr Tyr Cys Met Gin Phe Glu Trp 
50 55 60 

Lys Asn Lys Ala Tyr Met Asp His Val Cys Leu Leu Tyr Asp Gin Trp 
65 70 75 80 

Val Leu Ser Pro Pro His Lys Lys Glu Arg Val Asn His Leu Gly Asn 

85 90 95 

Leu Val lie Thr Trp Gly Ala Gin Thr Phe Lys His Gin Ala Phe Asn 
100 105 110 

Lys Leu Ala Asn Leu Phe lie Val Asn Asn Lys Lys Thr lie Pro Asn 
115 120 125 

Asn Leu Val Glu Asn Tyr Leu Thr Pro Met Ser Leu Ala Tyr Trp Phe 
130 135 140 

Met Asp Asp Gly Gly Lys Trp Asp Tyr Asn Lys Asn Ser Thr Asn Lys 
145 150 155 160 

Ser lie Val Leu Asn Thr Gin Ser Phe Thr Phe Glu Glu Val Glu Tyr 

165 170 175 

Leu Val Lys Gly Leu Arg Asn Lys Phe Gin Leu Asn Cys Tyr Val Lys 
180 185 190 

lie Asn Lys Asn Lys Pro lie lie Tyr lie Asp Ser Met Ser Tyr Leu 
195 200 205 

lie Phe Tyr Asn Leu lie Lys Pro Tyr Leu lie Pro Gin Met Met Tyr 
210 215 220 

Lys Leu Pro Asn Thr lie Ser Ser Glu Thr Phe Leu Lys 
225 230 235 

2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 
CGCTAGGGAT AACAGGGTAA TATAGC 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
GCGATCCCTA TTGTCCCATT ATATCG 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 
TTCTCATGAT TAGCTCTAAT CCATGG 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20 
AAGAGTACTA ATCGAGATTA GGTACC 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 
CTTTGGTCAT CCAGAAGTAT ATATTT 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 
GAAACCAGTA GGTCTTCATA TATAAA 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 
TAACGGTCCT AAGGTAGCGA AATTCA 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 

ATTGCCAGGA TTCCATCGCT TTAAGT 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 26 base pairs 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25 
TGACTCTCTT AAGGTAGCCA AATGCC 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 
ACTGAGAGAA TTCCATCGGT TTACGG 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
CGAGGTTTTG GTAACTATTT ATTACC 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28 
CCTCCAAAAC CAT T GAT AAA TAATGG 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 
GGGTTCAAAA CGTCGTGAGA CAGTTT 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
CCCAAGTTTT GCAGCACTCT GTCAAA 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
GATGCTGTAG GCATAGGCTT GGTTAT 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
CTACGACATC CGTATCCGAA CCAATA 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
CTTTCCGCAA CAGTATAATT TTATAA 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 
GAAAGGCGTT GTCATATTAA AATATT 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
ACCATGGGGT CAAATGTCTT TCTGGG 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
TGGTACCCCA GTTTACAGAA AGACCC 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
GTGCCTGAAT GATATTTATT ACCTTT 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 
GTGCCTGAAT GATATTTATT ACCTTT 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 

CAACGCTCAG TAGATGTTTT CTTGGGTCTA CCGTTTAAT 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 39 base pairs 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40 
GTTGCGAGTC ATCTACAAAA GAACCCAGAT GGCAAATTA 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 
CAAGCTTATG AGTATGAAGT GAACACGTTA TT 
(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 
GTTCGAATAC TCATACTTCA CTTGTGCAAT AA 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 
GCTATTCGTT TTTATGTATC TTTTGCGTGT AGCTTTAA 



* 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
CGATAAGCAA AAATACATAG AAAACGCACA TGGAAATT 38 
(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
CCAAGCTCGA ATTCGCATGC TCTAGAGCTC GGTACCCGGG ATCCTGCAGT CGACGCTAGG 60 
GATAACAGGG TAATACAGAT 80 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
GGTTCGAGCT TAAGCGTACG AGATCTCGAG CCATGGGCCC TAGGACGTCA GCTGCGATCC 60 
CTATTGTCCC ATTATGTCTA 8 0 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
ATCAGATCTA AGCTTGCATG CCTGCAGGTC GACTCTAGAG GATCCCCGGG TACCGAGCTC 
GAATTCACTG GCCGTCGTTT 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
TAGTCTAGAT TCGAACGTAC GGACGTCCAG CTGAGATCTC CTAGGGGCCC ATGGCTCGAG 
CTTAAGTGAC CGGCAGCAAA 
(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
TACAACGTCG TGACTGGGAA AACCCTGGCG TTACCCAACT TAATCGCCTT GCAGCACATC 
CCCCTTTCGC CAGCTGGCGT 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 



* 



ATGTTGCAGC ACTGACCCTT TTGGGACCGC AATGGGTTGA ATTAGCGGAA CGTCGTGTAG 60 
GGGGAAAGCG GTCGACCGCA 80 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

TAGGGATAAC AGGGTAAT 18 

(2) INFORMATION FOR SEQ ID NO: 52: 

;S (i) SEQUENCE CHARACTERISTICS: 
'« (A) LENGTH: 18 base pairs 

t! (B) TYPE: nucleic acid 

4l (C) STRANDEDNESS: single 

m (D) TOPOLOGY: linear 

fjh (ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
ATCCCTATTG TCCCATTA 



18 



